Introduction

Abdominal aortic aneurysms (AAA) are characterized by localized weakening and dilation of the abdominal aorta1 that results from a cascade of mechanisms including transmural inflammation2, smooth muscle cell (SMC) apoptosis3 and extracellular matrix (ECM) degradation4 that collectively lead to the loss of aortic wall elasticity. AAA often coexists with other cardiovascular diseases5,6,7 and risk factors such as aging and smoking8. However, the physiopathology of AAA is uniquely distinguished by signaling programs that result in the exaggerated degradation of the core constituents of the elements of the ECM that lead to life-threatening aortic rupture.

SNPs are single base-pair variations that occur in the DNA either within or outside the coding region of genes and have the potential to interfere at different steps of gene expression depending on their genomic location9. For example, SNPs present in non-coding segments of the genome have been shown to modulate the efficacy of gene transcription by impeding the accessibility of transcription factors in these response elements10. The presence of SNPs in coding region of genes can give rise to mRNA with different bases at SNP site which could impact the proper translation of mRNA while the presence of SNPs within a coding sequence can lead to an amino-acid change and protein-misfolding9,11. Recently, it has been shown that the identification and estimation of variance by all SNPs from GWAS of conventionally unrelated individuals might be a significant determinant of genetic heritability to particular diseases12,13. Notably, previous genetic studies have reported that AAA occurrence can be heritable within families, reaching up to a 20% increase in AAA susceptibility within first-degree relatives14,15,16. Notably, a population-based twin study estimated the heritability of AAA to be as strong as between 70 and 77%17. However, while the inheritance of SNPs in AAA has not been directly studied in family cohorts, several gene polymorphisms such as COL3A1, MYH11, and TGFBR2 were also identified to be transmitted within a familial lineage of AAA18. Despite this evidence pointing to the familial risk associated with AAA, there is a paucity of information linking these genetic signatures and their underlying mechanistic significance in the development of AAA.

GWAS have facilitated the accessibility of disease-specific SNPs19. Identification of SNPs as predictive markers for disease risk has been used for Alzheimer’s20, migraines21, and coronary artery disease22. In AAA, GWAS of the Million Veteran Program has examined the genetic associations of genetic variants in a subset of AAA independent of family history23. As such, identification of AAA-specific SNPs would be beneficial to further characterize the pathogenic pathway associated with each SNP and refine our understanding of the knowledge gap between genomic inheritance and the molecular trajectory that lead to the degradation of the aortic wall that occurs during AAA.

In the current study, we performed an integrative analysis combining the available GWAS catalogs in AAA along with non-GWAS gene association studies published in the literature. We identified 86 SNPs related to AAA and its associated risk factors. We curated this result using multiple bioinformatic analyses to expose their potential phenotypical signals via which they could contribute to AAA development.

Methods

Literature search

Identification of studies was conducted through a literature search of PubMed, GWAS central, and GWAS registry. The search query used to retrieve potentially eligible studies from PubMed was “abdominal aortic aneurysm AND genetics”, “abdominal aortic aneurysm AND SNPs”, and “abdominal aortic aneurysm AND GWAS”. In addition, we searched for other possible SNPs outside of GWAS central and GWAS catalog by considering AAA candidate gene association studies with detailed clinical characteristics, reporting data on the associations between AAA and different SNPs. We only included published full-text articles until April 2022. The flow diagram of the study inclusion is described in Fig. 1.

Figure 1
figure 1

Flow diagram of study inclusion for the analysis.

Selection criteria and quality assessment

GWAS studies were included if they were listed in the GWAS directory with the trait label “abdominal aortic aneurysm” (EFO_0004214), with no other labels. Non-GWAS studies were considered eligible if they evaluated the association between AAA and genetic polymorphism through effect measures of odds ratio (OR), with 95% confidence interval (CI); if they reported allele and minor allele frequency (MAF) of population groups; and if controls were in Hardy–Weinberg equilibrium. We excluded the studies with data published only in abstract form, studies where MAF in controls were lower than 1% (rare variants), with sample size fewer than 10 cases or controls, studies that were rebutted by others and studies that combined data with other cardiovascular diseases. systematic reviews and animal in vivo studies were also not included in our analysis. We based our selection criteria on the principles proposed by the Human Genome Epidemiology Network (HuGeNet) for the meta-analysis of molecular association studies20.

The following clinical data were extracted from eligible AAA candidate gene association studies when available: age, gender percentage, aortic diameter (mm), smoking history (past or current), hypertension, diabetes, coronary artery disease (CAD), peripheral artery disease (PAD), and dyslipidemia (or hypercholesterolemia). SNPs frequency and genotype data of case and control groups were also extracted.

SNPs-gene annotation and pathway enrichment analysis

The SNPs used in Gene-set Enrichment Analysis were obtained from GWAS database with the trait label “abdominal aortic aneurysm” (EFO_0004214) and from other studies where allele frequency in AAA was reported to be 3% more than in controls. The SNPs were annotated using snpXplorer AnnotateMe tool24 with the following settings: SNPs-gene annotation or Gene-set Enrichment Analysis, GRCh38, GTEx tissue (Blood, Aorta, Coronary) and GO:BP. For each SNP input, SNPs-gene annotation provided the following information: chromosomal position; MAF; CADD-annotation based pathogenicity score (CADD v1.6), variant consequence, affected gene; GTEx based eQTL (expression quantitative-trait-loci) and sQTL (splicing quantitative-trait-loci); closest affected gene. Input SNPs approximate affected genes based on position within the genome, expression in GTEx tissue, or their direct coding gene.

Gene enrichment analysis was performed on AAA-related genes using Gene Ontology (GO) terms as the gene-set source. Clustering of the enriched GO terms was performed using REVIGO25, and annotated terms were selected through a semantic similarity matrix and a dynamic cut tree algorithm for term-based clustering24. The plug-in GeneMANIA in Cytoscape was used as an analytical method to provide an association network integration to predict gene function and gene–gene interaction of the SNPs-associated gene in this study26,27. GeneMANIA weighted each functional genomic dataset according to its predictive value according to its gene query while suggesting more genes with similar domain structure or physical interaction27. Gene interaction was plotted via Cytoscape.

A Mann–Whitney U test was performed using R software28 on patient clinical data. Comparisons of age, aortic diameter, male percentage, smoking history (past and current), hypertension, dyslipidemia, diabetes, CAD, and PAD were performed if data was available. Data were plotted using R studio (v.1.3.1093).

Results

Distribution of AAA SNPs in the genome and their gene regulation

The literature search identified 86 SNPs related to AAA, 48 of which originated from the recorded GWAS databases, and 38 were from AAA gene association studies in which SNPs frequency in AAA patients was at least 3% higher than in the control group. The variant-gene mapping procedure performed with SNPs snpXplorer AnnotateMe tool showed that most of the identified SNPs were annotated based on their chromosomal position (n = 49), followed by their eQTL GTEx tissue expression (n = 28) and direct gene coding region (n = 9) (Fig. 2A,B), implying the possible effect of the SNPs of interest in AAA. The specificity of each SNPs was observed since the majority of the SNPs (N = 57 variants) are mapped and annotated for one gene (Fig. 2C). We observed a random distribution of SNPs across different chromosomes. Chromosome 3 and 9 predominantly harbored most of the AAA SNPs (Fig. 2D). No SNPs were found on chromosome 17, X, and Y, suggesting that AAA SNPs identified in this study are not sex-linked (Fig. 2D).

Figure 2
figure 2

Genome Distribution of 86 SNPs Associated with AAA. (A) Most of the identified SNPs were annotated based on their chromosomal position (n = 49), followed by GTEx tissue expression (n = 28) and direct gene coding (n = 9). (B) The frequency of genes per genetic variant identified in this study. (C) The circular plot analyzed using snpXplorer AnnotateMe tool visualized the annotation type of each genetic variant (coding region, eQTL, or their positions) and their respective minor allele frequency and chromosomal distribution. (D) Distribution of SNPs across different chromosomes, with chromosome 3 and 9 having more AAA-related SNPs. (E) The complete SNP chromosome mapping of chromosome 3 and 9 as the top two locations of SNPs in AAA using a scale of 1:250,000 scale of pixels. (F) SNPs associated with AAA most commonly reside in their respective gene's intronic, regulatory, upstream, downstream, or intergenic regions.

On average, each chromosome contains 4 SNPs, with chromosomes 3 and 9 in contain the most SNPs (Fig. 2E). In the chromosome 3, we found 8 SNPs that affect a single gene (TGFBR2) and 1 SNP that affects 9 different genes (ITIH4, DNAH1, SFMBT1, GNL3, GLYCTK, NT5DC2, STIMATE, GLYCTK-AS1, MUSTN1). In the chromosome 9, we found 3 SNPs that affect a single gene (TGFBR1). The complete SNP chromosome mapping in a 1:250,000 scale of pixel to base pair can be seen in Supplementary Fig. 1.

Single SNP can affect multiple genes depending on their genomic location. Most of the SNPs associated with AAA reside in the intronic, regulatory, upstream, downstream, or intergenic region of their respective gene (Fig. 2F). In particular, SNPs with high CADD pathogenicity scores (> 10) result in non-synonymous, noncoding change, and stop gained mutations. A high CADD pathogenicity score is indicative of the deleterious effect of the SNP, as compared to other possible mutations within the human genome29.

SNPs and the associated genes are linked with high pathogenicity in AAA

We identified 15 SNPs affecting 20 genes with a CADD (Combined Annotation Dependent Depletion) pathogenicity score above 10 (Table 1). CADD scores correlate with pathogenicity, disease severity, regulatory effects, and complex trait associations. A score greater than 10 indicates that the nucleotide substitution is predicted to be the 10% most deleterious substitution within the human genome, a score of 20 or greater indicates the 1% most deleterious, a score of 30 or greater indicates the 0.1% most deleterious and so on. We found 1 SNP with a CADD score above 30 (rs5516, KLK1), 1 SNP with a CADD score above 20 (rs1801133, MTHFR), and 13 SNPs with a CADD score above 10.

Table 1 The list of SNPs with high CADD (combined annotation dependent depletion) pathogenicity score.

We confirmed the association between SNPs with high CADD pathogenicity scores with their frequency on the retrieved 38 AAA-gene association studies (Table 2). We identified four SNPs and associated genes consisting of the SNP rs5516 (KLK1), rs1800795 (IL-6), rs2230806 (ABCA1), and rs243865 (MMP-2) have both higher CADD scores and frequency in AAA patients. However, we observed that a high CADD score does not necessarily correlate with frequency in AAA.

Table 2 Top 20 upregulated SNPs from the retrieved AAA gene association clinical studies.

Biological traits associated with AAA SNPs

Using SnpXplorer AnnotateMe platform, gene enrichment analysis from the GWAS-catalog associations of the AAA SNPs showed strong correlations with various lipid metabolism pathways such as LDL-cholesterol measurement (52%), total cholesterol measurement (34%), triglyceride measurement (26%), HDL-cholesterol measurement (23%), and CRP measurement (14%) (Fig. 3A). Only 49% of all tested SNPs were found to be directly labeled with the AAA trait, this may suggest that the remaining 51% have an indirect association with AAA occurrence through various lipid metabolism pathways and genes. As for associations with other cardiovascular diseases, AAA and CAD share 25% of the same SNPs, while AAA and MI only share 10%.

Figure 3
figure 3

GWAS traits of AAA SNPs. GWAS-catalog Traits of 86 SNPs and Genes Associated with AAA, shown in a fraction of (A) total SNP (86) or (B) genes (130). The numbers presented are fractions of the GWAS-catalog traits from all tested SNPs or genes.

In addition to SNPs, we present the GWAS-catalog associations of the genes associated with AAA SNPs. AAA-SNPs associated genes have correlations with various diseases such as coronary artery disease (21%), myocardial infarction (12%), and type II diabetes (10%). AAA SNPs-associated genes were also found to be associated with genes relating to total cholesterol measurement (17%), LDL-cholesterol measurement (15%), CRP measurement (12%), HDL-cholesterol measurement (10%), and triglyceride measurement (9%) (Fig. 3B).

Gene ontology analysis of AAA SNPs

Gene-set enrichment analysis was done with the genes associated with AAA SNPs, using Gene Ontology (GO) as the gene-set source. The gene-set enrichment was then visualized using REVIGO25. Annotated GO terms were selected using a 2-step process as described by Tesi et al. (2021) in the snpXplorer web server24. The GO term clustering and annotation in Fig. 4 depict the most prominent and significant biological processes and associated genes in AAA based on the snpXplorer annotation algorithm.

Figure 4
figure 4

REVIGO Gene Ontology term clustering of 130 AAA genes. The position of each cluster within the semantic space does not matter. Semantically similar GO terms are positioned closer together in the plot.

The AAA-associated genes were found to be prominently involved in cell population proliferation, followed by regulation of cell population proliferation (p = 3.19 × 10−8), muscle cell proliferation (p = 3.08 × 10−6), regulation of smooth muscle cell proliferation (p = 4.97 × 10−6), regulation of plasma lipoprotein particle levels (p = 1.58 × 10−5), regulation of protein kinase activity (p = 2.15 × 10−5), regulation of phosphorus metabolic process (p = 2.77 × 10−5), and blood circulation (p = 4.63 × 10−5). Images processed using SNPsXplorer showing that these annotated terms were found to be most significant according to the semantic similarity (Supplementary Fig. 2) and a dynamic cut tree algorithm for term-based clustering and p values (Supplementary Fig. 3). The most significant gene ontology terms associated with the SNPs associated genes are summarized in Table 3.

Table 3 List of top 20 gene ontology terms associated with the total SNPs associated genes in AAA.

Significant signaling networks associated with AAA SNPs

To assess the interaction between SNPs associated genes related to AAA, gene–gene interactomes were constructed and plotted using GeneMANIA plugins in Cytoscape (Fig. 5). The most significant Genes related to top 25 GO annotated functions (Table 4) were included to further grouped and visualized on the network module to see their network association according to their shared pathways, co-localization, co-expression, and physical interaction (Fig. 5A). As a hallmark of pathological remodeling in AAA, we observed the strongest interaction between TGFB1 with the TGFB1 receptor family and notably strong co-localization with MMP2 and MMP9, suggesting the role of TGFB1 pathway as one of the reminiscent factors that could link the presence of SNPs to the development of AAA. This analysis also includes several additional genes predicted to convey a strong interaction with some of our gene candidates. 14 genes were associated with lipid metabolism pathways; regulation of plasma lipoprotein level, regulation of lipid localization, regulation of lipid transport, and regulation of cholesterol transport (Fig. 5B). 10 genes were associated with extracellular matrix (ECM) organization (Fig. 5C). 11 genes were associated with smooth muscle cell proliferation pathways (Fig. 5D). 10 genes were associated with reactive oxygen species metabolism (Fig. 5E). IL-6, TGFB1, and TGFB1 receptor family shared three modules, the most out of all genes. IL-6 is involved in lipid metabolism, ECM organization, and smooth muscle cell proliferation pathways. While TGFB1 and the TGFB1 receptor family are involved in ECM organization, smooth muscle cell proliferation, and ROS metabolism pathways.

Figure 5
figure 5

GeneMANIA interaction network from the SNPs associated genes in AAA. (A) Circular plot of network association according to top 20 gene ontology annotated function. Transforming growth factor-beta (TGFB1) shares strong physical interaction with its receptor; transforming growth factor beta receptor 1, 2, and 3 (TGFBR1, TGFBR2, TGFBR3) as well as a strong co-localization with matrix metalloproteinase 9 (MMP9). The network of hub genes associated with AAA-related pathways consists of (B) lipid metabolism pathway, (C) extracellular matrix (ECM) organization, (D) smooth muscle cell proliferation, and (E) reactive oxygen species (ROS) metabolism is hierarchically plotted to show their interaction. The color depth of nodes represents the corrected P-value.

Table 4 The enriched Gene ontology (GO) term and pathways from the SNPs associated gene queries according to the GENEmania plugin of Cytoscape.

Clinical characteristics of AAA patients

Out of the AAA candidate gene association studies in the literature, we found 15 studies with complete clinical data, totaling a sample size of 10.956 (5676 control & 5280 case). The available clinical data were age (mean), men (n), aortic diameter (mm, mean), smoking (current & past), hypertension, diabetes, CAD (coronary artery disease), PAD (Peripheral Artery Disease), and dyslipidemia. Data variabilities between each variables are different depending on the availability. A detailed summary of each study can be found in Supplementary Table 1.

The non-disease control and AAA case groups shared a similar average age (69.1 and 70.9 years). Several clinical characteristics, such as a history of smoking (past or current, p = 0.037), hypertension (p = 0.013), and dyslipidemia (p = 0.042), were positively associated with AAA. Conversely, there were no significant differences in the presence of diabetes, CAD, and PAD between the two populations. This association is summarized in Fig. 6.

Figure 6
figure 6

AAA SNPs association with disease phenotype and risk factors. Boxplot representing the association of several identifiable risk factors of AAA between the case group and non-disease controls. Hypertension (P = 0.013), dyslipidemia (P = 0.042), and smoking (current or history) (P = 0.037) are associated with AAA. The sample size consists of 10.956 individuals (5676 non-disease control & 5280 AAA cases).

Discussion

As a complex multifactorial disease, there is little evidence pinpointing the specific genetic predisposition that could contribute to the development of AAA. Previous studies have identified several SNPs associated with AAA, but little is known about the underlying mechanisms and biological significance of the identified SNPs to AAA pathobiology. In this integrative in silico analysis, we used snpXplorer AnnotateMe platform to merge the SNPs candidate from the GWAS catalog and previously published AAA clinical cohorts. We further identify several SNPs with possible pathological signaling pathways associated with AAA as well as their correlative risk factors in these cohorts.

In the present study, we calculated the frequency of the top 20 SNPs in AAA compared to non-disease control and measured the CADD pathogenicity score in order to observe their association with the risk of developing AAA. Here, we identified SNP rs5516, a stop-gained mutation for KLK1 (Kallikrein 1), with the highest pathogenicity score and a significantly high frequency in AAA (17.8%). Indeed, previous SNPs studies performed in both Australian and Asian cohorts have shown the association of rs5516 with AAA30,43. Our analysis identified other interesting polymorphism profiles associated with significant genes that correlated with AAA. We identified a high frequency of rs1800629 for TNF-α, and rs1800795 for IL-6, inflammatory cytokines well-known to mediate the inflammatory response and smooth muscle cell proliferation during AAA development44,45,46. Indeed, Jablonska et al. reported that the mutation detected in both of the alleles increased the risk of AAA formation among heterozygous carriers31. However, our result show that CADD score did not necessarily correlate with the frequency of AAA in the observed studies. The SNP rs1801133, located in the coding region of MTHFR (Methylenetetrahydrofolate Reductase), was one of the highest-scoring SNPs, yet it’s CT/CC allele was only 3% more expressed in Greek AAA patients than in controls34. However, in this study, the analysis of the SNPs in the UK cohort revealed that the frequency of SNP rs1801133 was significantly more frequent. This is most likely due to the population, socio-economical, environmental and genome differences in these two cohorts living in two different geographic areas. Indeed, the meta-analysis performed for SNPs on MTHFR showed a discrepancy regarding the protective or pathogenic role of either SNP rs1801133 or MTHFR47,48,49. Therefore, this result contextualizes the plasticity of population-specific SNPs and its phenotypic effect to the development of AAA.

Several other SNPs with high frequency in AAA such as rs2230806 (ABCA1), rs3775290 (TLR3), and rs10757278 (CDKN2B) were also strongly associated with significant AAA pathways50,51,52. The loss of CDKN2B promotes p53-dependent smooth muscle cell apoptosis and aneurysm formation. In parallel, our analysis revealed high pathogenicity of regulatory CDKN2B SNPs (CADD score: 10.42), where this SNP was observed in Chinese and European cohorts39,53 and mark the possible genetic-environment interplay in AAA development. However, we were unable to validate its shared associated network in AAA-related pathways along with other gene candidates listed in this study. Therefore, a deeper understanding of the role of CDKN2B in AAA is warranted.

The profile of SNPs found in the DNA between genes can act as an essential biological marker with a strong association with the disease. Interestingly, our descriptive analysis conducted from the gene association clinical studies described significant traits related to the well-described risk factors of AAA such as CAD and dyslipidemia pointing a possibility that specific SNPs profile might factor on the progression of these risk factors, and eventually fuel the expansion of AAA. For example, our analysis showed rs429358 polymorphism in APOE, a canonical transporter of cholesterol particles54, had a significant pathogenicity score in AAA patients. However, this SNP was found to be unassociated with AAA in an Australian cohort55. This discrepancy could rely on the characteristics of the cohorts and the type of analysis performed in each study. Genotyping of specific APOE alleles was performed in the Australian AAA cohort in 640 samples compared to GWAS population study from 7600 AAA cases using DNA sequencing cross referenced with DNA variants library from European-descent veterans across USA23. The sensitivity of the meta-analysis, difference in geographic cohorts studied and the power of the sample size were likely more amenable to capture associations of AAA with rs429358 APOE SNPs polymorphism. This further emphasizes the need to perform large-scale worldwide multi-centered studies including populations from disparate groups to uncover the full spectrum of drivers of AAA genetic vulnerability.

Notably, a previous multi-ethnic cohort study has described that the polymorphism of APOE had a significant association with dyslipidemia in Asian ethnic groups56. The APOE gene is polymorphic and co-exists as APOE-ε2, -ε3 and -ε4 alleles57. APOE-ε4 has been shown to associate with increased in LDL-cholesterol levels and higher cardiovascular risk58. The highest frequency of APOE-ε4 carrier was observed in the Malay population and was associated with high LDL-C levels. These observations suggest that the inherent geographical genetic background of certain ethnic groups could present a diverse portfolio of SNPs associated with AAA, which warrants further studies.

The main strength of this study is the stringency of our method to include the most recent GWAS catalogs in AAA and combine the data with other available cohorts in the literature to comprehensively identify SNPs with a strong association with AAA. We have summarized and visualized the genomic distribution and frequency of each SNP using the most recent bioinformatic software (snpXplorer)24 to identify their associated molecular pathways and variant consequences. We have integrated this result into pathway enrichment analysis to focus on the biological interaction of the associated genes in the AAA pathogenesis.

We acknowledge several limitations in our study. Considering the limited genomic studies performed in AAA, the number of samples included in our analysis is lower than other similar in silico SNPs analysis in other diseases. This limitation hindered us from performing a meta-analysis on each SNPs candidate to validate its consistency between each geographical origin and to confirm the region-specific susceptibility of AAA59. Indeed, the small number of cohorts is also a major limitation in performing and assessing the predictive potential of these SNPs candidates in AAA. Moreover, the technological advances in the development of sequencing such as whole-exome sequencing could facilitate the identification of SNPs in the human genomic data. However, this would require significant financial resources and might not be applicable in each clinical setting, thus limiting the potential to use SNPs profile as a predictive marker to identify AAA.

In conclusion, we have identified a significant profile of polymorphism associated with important risk factors such as dyslipidemia and the main pathobiological pathways of AAA development. Further investigation in large population studies will be necessary to confirm this finding and to finally reveal the specific genetic heritage of individuals carrying a risk of developing AAA.