Introduction

Placental growth factor (PlGF) is a member of the vascular endothelial growth factor (VEGF) family. It is a cytokine with a non essential role in healthy conditions, but with a specific involvement in several malignant, inflammatory and ischemic disorders1,2,3. High levels of circulating PlGF have been observed in individuals with various diseases such as cancer (breast4, melanoma5, leukemia6), immune diseases (rheumatoid arthritis7, Systemic Lupus Erythematosus8), metabolic syndrome9, coronary artery disease10 and neovascular age-related macular degeneration (nAMD)11. PlGF has recently emerged as a predictor of survival and cardiovascular risk and for cardiovascular risk stratification in patients with chronic kidney disease (CKD)12. Women affected by preeclampsia, a pregnancy-specific disorder characterized by the development of hypertension and proteinuria in the later stages of gestation13, show lower PlGF circulating levels compared to healthy pregnant women14,15,16. All these data show that the amount of PlGF has a relevant impact on the determination of pathological conditions. Moreover, since PlGF levels can be pharmacologically modifiable, understanding the determinants of circulating PlGF could support efforts directed at risk prediction, prevention and therapy.

Although the heritability of circulating PlGF levels is around 40%17, the knowledge about the genetic factors modulating circulating PlGF levels is limited. Indeed, no studies have assessed the relation between circulating PlGF levels and genetic variants at the genome-wide level. In the present work, we have conducted the first genome-wide association study (GWAS) for PlGF in a general population sample with a deep genomic coverage based on imputation to the 1000 Genomes panel, to identify genetic variants that explain variation in circulating PlGF concentrations.

Results

GWAS of PlGF plasma levels: discovery, replication and meta-analysis

A GWAS meta-analysis of PlGF circulating levels was performed in 2068 individuals from three isolated populations of Cilento, South Italy18,19. Characteristics of the study participants are presented in Table 1. A sample of 1600 individuals was used as discovery cohort. An additional sample of 468 individuals from the same populations was used in the de novo replication stage. The replication sample was younger than the discovery sample, no other differences between the two cohorts were observed.

Table 1 Characteristics of study participants from the discovery and replication cohorts.

In the discovery GWAS 8,281,256 autosomal SNPs were investigated for association with PlGF plasma levels. A Manhattan plot and a Q-Q plot of the association results are reported in Fig. 1 and S1 Figure, respectively. Although none of the signals in the discovery cohort reached genome-wide significance (p-value < 5 × 10–8), 88 variants were associated at p-value < 1 × 10–5 and, among those, 7 variants were associated at p-value < 1 × 10–6. A linkage disequilibrium (LD)-based clumping procedure revealed 5 independent signals, suggestively associated with PlGF plasma concentration at a p-value < 1 × 10–6. The regional association plots reported in the S2 Figure provide a detailed overview of those loci. Those variants were carried forward to a de novo replication. Summary results for main associations for PlGF plasma levels are shown in Table 2. Overall, in the meta-analysis of discovery and replication, a variant on chromosome 10p11.22 reached a genome-wide significant p-value; two additional signals, on chromosomes 13q12.3 and 7p15.3, although didn’t reach the genome-wide significance, showed an effect in the same direction between discovery and replication and a lower p-value in the meta-analysis compared to that of the discovery stage. Therefore, based on these criteria, we considered the 13q12.3 and 7p15.3 loci as replicated signals. In the locus on chromosome 10p11.22, the most significantly associated variant was rs17296631 (p-value = 8.36 × 10–9) located upstream Neuropilin 1 (NRP1) gene, which encodes for a co-receptor of PlGF protein. The rs9551465 variant (p-value = 7.84 × 10–8), identified on chromosome 13q12.3, was located in the 3’UTR of the Fms Related Tyrosine Kinase 1 (FLT1) gene, coding for the PlGF receptor Flt-1/VEGFR-1. The variant rs77619310 (p-value = 5.27 × 10–7) on chromosome 7p15.3 was located in an intron of the RAPGEF5 gene.

Figure 1
figure 1

Manhattan plot of genome-wide association results in discovery analysis. Manhattan Plot showing − log10(p-values) for all SNPs of the PlGF discovery GWAS ordered by their chromosomal position. The blue dashed line indicates the suggestive association threshold (p-value < 1 × 10–6), while the red dashed line indicates the genome-wide significant threshold (p-value < 5 × 10–8). Blue dots are the SNPs associated at the suggestive significance level (p-value < 1 × 10–6). For each locus, the nearest gene is reported.

Table 2 Top variant results from discovery and replication meta-analysis.

The three most associated loci collectively explained 3.68% of the PlGF phenotypic variance.

Colocalization analysis

To verify if the variants associated with PlGF circulating levels directly act through gene expression regulation in a particular tissue, and considering that, given the involvement of PlGF in different pathological conditions, it might exert a role in several tissues, we tested the three signals on chromosomes 7, 10 and 13 for colocalization with expression quantitative trait loci (eQTL) in all the 48 tissues from Genotype-Tissue Expression (GTEx)20 using the coloc program21 (https://rdrr.io/cran/coloc/man/coloc.abf.html).

We highlighted a very high posterior probability of colocalization (H4 = 0.96) in the Artery Tibial tissue for the rs77619310 variant, which is the best GWAS hit on chromosome 7, and an eQTL for the RAPGEF5 gene. The A allele, reported to determine an increase of RAPGEF5 mRNA levels, was also associated with higher levels of PlGF.

This analysis also revealed that the rs9551465, the most associated variant on chromosome 13, showed the best colocalization signal (H4 = 0.79) in the Thyroid tissue, where it is also reported as an eQTL for the FLT1 gene. In this case the allele associated with an increase of PlGF levels was correlated with a lower expression of the FLT1 gene.

The best signal on chromosome 10, the rs17296631, is in complete LD with rs145141871, the second most associated variant in this locus, and both are reported as eQTLs for the NRP1 gene in the Brain Anterior cingulate cortex tissue. For both variants, the alleles that lead to higher expression levels of the NRP1 gene are also associated with an increase of PlGF levels. Despite this evidence, coloc did not reveal significant colocalization evidence, confirming that in such a situation of complete LD between two variants, the program shows uncertainty in identifying the causal one21.

Transcription-wide and gene-based association analyses

To further identify loci associated with PlGF levels, we performed a transcription-wide association analysis (TWAS) and a gene-based analysis.

In our study, TWAS analysis identified several genes in different genomic regions whose expression in particular tissues, is associated with circulating levels of PlGF protein. The list of the 52 gene/tissue pairs, showing a significant association at FDR (< 0.05), is reported in the S1 Table. Interestingly, the majority of them are located on chromosome 6p22.1-p21.33, in the HLA region. These results suggest that multiple variants, each likely with a marginal level of significance and located in the HLA genomic region, might contribute, in aggregate, to the regulation of the PlGF circulating levels.

The top 10 genes from the VEGAS2 gene-based analysis22 are reported in the S2 Table. Although none of the analysed genes showed a statistically significant False Discovery Rate (FDR)-adjusted p-value, the best association was found for the FLT1 gene, in accordance with the result obtained in the GWAS. Furthermore, except for the SMIM15-AS1 gene on chromosome 5, the other associated genes are all located in the HLA region on chromosome 6, in conformity with the results obtained in the TWAS analysis, and again suggesting an involvement of this locus on the regulation of the PlGF levels.

Pathway analysis

To discover biological pathways involved in the modulation of the circulating levels of PlGF, we performed an enrichment analysis using the GSA-SNP2 program23. The analysis highlighted 27 Gene Ontology (GO) significantly enriched pathways (q-value < 0.05) (16 Biological Processes, 5 Cellular Components and 6 Molecular Functions), reported in Table 3. In line with the results obtained in the TWAS and the gene-based association analysis, the majority of the significantly enriched pathways were related to immune response. In fact, the GO terms summarization analysis performed by REVIGO program24 evidenced that 15 out of the 16 Biological Processes were linked to the immunoglobulin mediated immune response, and that 2 Cellular Components and one Molecular Function were related to the MHC class II complex. Also, 3 GO terms were linked to the lumenal side of the endoplasmic reticulum membrane. Other significantly over-represented pathways were the N-glycan processing, the oxidoreductase activity, the mismatched DNA binding, the folic acid binding and the AU-rich element binding.

Table 3 Significantly enriched pathways.

Discussion

Our study is the first GWAS of circulating PlGF levels. It was undertaken in 2068 individuals from three population isolates of the Cilento area, South Italy, and represents the largest survey of PlGF measurement in a population-based sample. In this study, we have identified 3 chromosomal regions (7p15.3, 10p11.22 and 13q12.3) harboring variants associated with PlGF plasma levels. The three top variants in these loci explain about 4% of PlGF phenotypic variance.

The newly identified regions include many interesting and plausible candidate genes.

The lead variant on chromosome 10, rs17296631, is located in an intergenic region upstream the NRP1 gene. This gene encodes for the Neuropilin-1, a membrane protein devoid of tyrosine kinase activity which acts as a co-receptor for PlGF. Recent studies suggest that NRP-1, through the binding of PlGF and other growth factors, can play an important role in the activation of specific signal transduction pathways also independently of other membrane receptors5,25. A positive correlation of expression of PlGF and NRP-1 has been observed in breast cancer4, and PlGF activation of NRP-1 could also promote tumor cell survival in a paracrine manner in a mouse model of medulloblastoma25. These observations are in line with the results of the eQTL analysis, in which the T allele of the rs17296631 variant is associated with an increase of both PlGF levels and NRP1 gene expression.

On chromosome 13, the rs9551465 is located in an intron of the FLT1 gene. This gene encodes for a tyrosine-protein kinase, Flt-1, belonging to the vascular endothelial growth factor receptor (VEGFR) family, which functions as a receptor for VEGFA, VEGFB and PlGF. The protein is also present in a soluble form, sFlt-1, lacking of the transmembrane and intracellular domains. The PlGF binding to Flt-1 stimulates angiogenesis via both direct and indirect mechanisms: the activation of Flt-1 by PlGF results in phosphorylation of specific tyrosine residues in Flt-1 and downstream signaling different from those activated by VEGFA binding; also, it has been proposed, based on in vitro data and overexpression studies, that the binding of PlGF to Flt-1 induces pro-angiogenic effects as PlGF shifts VEGFA from Flt-1 to VEGFR2, enhancing the effects of VEGFA26. In the present study, we have found that the PlGF associated rs9551465 variant in the FLT1 gene is also an eQTL for FLT1: in particular, the variant allele (T) is associated with higher PlGF protein levels and with a lower FLT1 gene expression. This is consistent with the observation that a higher expression of Flt-1, acting as a “decoy” for PlGF, can determine lower detectable circulating levels of the ligand protein14.

The third associated variant, rs77619310, is located on chromosome 7p15.3 in an intron of the RAPGEF5 gene. This gene, encoding for a guanine nucleotide exchange factor (GEF) for the GTPases Rap1, Rap2 and M-RAS, serves as RAS activator by promoting the acquisition of GTP to maintain the active GTP-bound state and is a key link between cell surface receptors and RAS activation27,28. Interestingly, Rap1, activated by different GEFs, including RAPGEF5, acts as a regulator of several basic cellular functions such as adhesion, polarity, differentiation and growth29,30. In the endothelium, Rap1 is a key positive regulator of angiogenic process31. It functions downstream of the main angiogenic growth-controlling receptors in endothelial cells, including VEGFA32. Also, Rap1 promotes VEGF-mediated angiogenesis through the activation of VEGFR233 and is capable of regulating endothelial barrier permeability by VEGF stimulation34.

In a previous candidate gene study, we reported the association of the PlGF gene locus with the levels of the protein in plasma17. More recently, another study has found an association between an additional polymorphism in the PlGF gene and the protein plasma levels35. In the current study, no variants in the PlGF gene region reached the statistical significance, however, some SNPs located between 5 and 20 kb upstream of the gene were nominally associated with PlGF levels (rs4903273, p-value = 2.04 × 10–3) and showed a moderate LD (r2 = 0.56) with the rs2268614 SNP described in our previous work17.

An implication of HLA genes in the regulation of PlGF levels was detected at statistical significance in the TWAS and also suggested by a gene-based association analysis, confirming an increased power of TWAS analysis which takes advantage from gene expression level data36. In particular, such genes belong to the HLA class III and are involved in both the immune system function and the angiogenic process. In detail, gene expression levels of LY6G5B and LY6G5C, belonging to the cluster of leukocyte antigen-6 (LY6) genes, showed an association with PlGF circulating levels in multiple tissues. Little is known about the proteins coded by these two genes, however other members of the LY6 protein family, expressed in various types of tissues and at different stages of cell differentiation, are involved in cell proliferation, cell migration, cell–cell interactions, immune cell maturation, macrophage activation, and cytokine production37,38, and their overexpression or dysregulation is associated with tumorigenesis and autoimmune diseases39. Also, the C4A gene, encoding for the acidic form of the complement factor 4, plays a pivotal role in the activation of immune defenses and the clearance of immune complexes or apoptotic debris in vitro and in vivo studies40,41. A lower number of gene copies of C4A has been linked to an increased risk for different autoimmune diseases, such as Systemic Lupus Erythematosus, Type 1 Diabetes and Juvenile Dermatomyositis42,43,44. In addition, the deficiency of the C4A gene has also been associated with preeclampsia, a well-established PlGF related disease14,15,16,45,46, with a lower gene copy number associated with an increase of disease severity, supporting the importance of the classical pathway of the complement system in this pathology47. Also, higher levels of C4A have been observed in patients with neovascular age-related macular degeneration48,49, an ocular pathology also characterized by an increase of PlGF levels11. Finally, some evidences link the pseudogene MICD to the immune system: in fact, SNPs in this gene region have been associated with eosinophil, basophil and granulocyte count50 and with different autoimmune disease, such as vitiligo51,52, Graves' disease53 and psoriatic arthritis54.

Other genes in the same region, the Dimethylarginine dimethylaminohydrolase-2 (DDAH2) and the TCF19, are implicated in the angiogenic process. The DDAH2 gene encodes for an enzyme that positively regulates the nitric oxide (NO) generation by metabolizing the asymmetric dimethylarginines (ADMA), which are inhibitors of the nitric oxide synthase (NOS) activity55. DDAH2 acts as a key regulator of the angiogenesis: in fact, its overexpression enhances the proliferation and migration of endothelial cells through the induction of expression and secretion of VEGF, both regulating the production of endothelial NO through the stimulation of endothelial NOS (eNOS) activity56, and in an eNOS-independent manner, via the activation of Sp1 binding site of the VEGF promoter57. DDAH2 is considered an antiatherosclerotic molecule: in fact, hypermethylation of DDAH2 promoter, accompanied by its reduced expression, correlates with endothelial dysfunction in patients affected by Coronary Artery Disease58. The inhibition of DDAH2 has also been correlated with an attenuation of aberrant angiogenesis and with the improvement of vascular regeneration in mice models of oxygen-induced retinopathy (OIR)59, and similar effects on the retinal vasculature have also been observed in PlGF deficient OIR mouse models1. Also, decreased expression levels of DDAH2 have been observed in women affected by preeclampsia60,61.

The TCF19 gene encodes a transcription factor containing a PHD-type zinc finger domain and plays a role in the proliferation and apoptosis of pancreatic beta cells62. SNPs in this gene have been associated with chronic Hepatitis B, Type 1 and Type 2 Diabetes63,64,65. Recently its overexpression has been linked to an increase of cell proliferation in different types of cancer66,67.

It is of interest to note that, although colocalization and TWAS analyses both use GTEx expression data, the PlGF associated genes, which colocalize with eQTL signals, were not detected by TWAS analysis. From this observation, we noted that in the PredictDB database (http://predictdb.org/) the prediction models used for the imputation of the expression levels are present, depending on the tissue, for a limited and variable number of genes. Also, imputation is performed using a less dense panel of variants compared to that used in the GWAS. Both these circumstances might explain the possibility of bypassing some association signals in the TWAS. In our analysis, NRP1 and FLT1 imputed genes expression levels are missing for those tissues (Brain Anterior cingulate cortex and Thyroid, respectively) in which the colocalization was identified. For the RAPGEF5 gene, for which a very high posterior probability of colocalization was evidenced in the Artery Tibial tissue, the TWAS failed to detect an association (p-value = 0.017), probably because imputed expression levels were obtained using a reduced number of variants (N = 27), not including any of the PlGF associated variants detected in the GWAS at this locus.

The pathway analysis also revealed an over-representation of several pathways linked to the immune system. To date, only a few studies have linked PlGF to the adaptive immune response. Lin et al. reported the evidence of immunosuppressive properties of PlGF. They found that PlGF inhibited the activation and maturation of human dendritic cells, differentiated from CD14 + monocytes, through the NF-κB signaling pathway. PlGF-treated dendritic cells resulted in the downregulation of maturation markers CD80, CD86, CD83, CD40, and MHC-II expression as well as the inhibition of IL-12, IL-8, and TNF-α production in response to Lipopolysaccharide stimulation, in respect to untreated dendritic cells. In addition, treatment of dendritic cells with PlGF resulted in the suppression of naïve CD4+ T cell proliferation in an allogeneic mixed lymphocyte reaction. The results from this study indicate that PlGF can downregulate type 1 T helper immune responses by modulating the function of dendritic cells68.

Han et al. reported the effect of PlGF on the regulatory B cells (Bregs), a subset of B lymphocytes with a pivotal role in regulating immune responses involved in inflammation, autoimmunity and cancer. Using surgically removed glioma tissues, they demonstrated that glioma cells release exosomes carrying PlGF. When purified naïve B cells captured the PlGF-containing exosomes from glioma cells, they differentiated into TGF-β+ Bregs able to suppress the CD8+ T cell activities. Further, the treatment of glioma cells with an anti-PlGF antibody (TB-403), a (PlGF)-specific inhibitor, completely suppressed the expression of the TGF-β, while the exposure of glioma cells to PlGF, upregulated the expression of the TGF-β69.

A recent work has demonstrated that PlGF is selectively secreted by the helper T cells (TH17), a subset of inflammatory T cells that, producing IL-17, contribute to autoimmunity and tissue damage70 and which dysregulation is associated with various autoimmune diseases, including multiple sclerosis and rheumatoid arthritis71,72. In the same article, the authors demonstrate that T cell-produced PlGF is functionally active in promoting angiogenesis and that PlGF stimulates TH17 cell differentiation by activating STAT3 via binding to the Flt1 and NRP1 receptors. Also, they show that the overexpression of PlGF in T cells exacerbates disease in mice with collagen-induced arthritis and that PlGF concentrations also correlated with IL-17 concentrations in synovial fluid from patients with rheumatoid arthritis. Overall, they provide an insight into the links between angiogenesis, TH17 cell development, inflammation and autoimmunity, emphasizing the importance of PlGF in these processes70. Previously, Kang et al., using a PlGF-overexpressing transgenic mice model, showed that PlGF secretion was upregulated in isolated T-cells, suggesting PlGF as a regulator of T-cell differentiation. In addition, the authors evidenced that the CD4 + T-cells isolated from the spleen of transgenic mice indicated greater inflammatory TH1 and TH17 helper T-cell differentiation, thereby emphasizing the role of PlGF in T-cell differentiation and development73.

In conclusion, in this study we have identified some genes and pathways, known to be implicated in the angiogenesis process and the immune response, that are associated with the variation of circulating PlGF. The identification of those genes corroborates the link between PlGF protein levels and immune system function and could lead to new preventive and therapeutic strategies in immune and/or angiogenesis-related diseases in which PlGF has been implicated.

Methods

Population samples and PlGF measurement

The discovery sample includes 1600 volunteer individuals recruited through a population-based sampling strategy in three small isolated villages of the Cilento region, South Italy (Campora, Gioi and Cardile). A de novo replication was performed in additional 468 subjects from the same villages18,19. A subset of this sample (N = 871) was used in our previous study on the genetics of PlGF plasma levels17.

The Cilento study was approved by the ethics committee of “Azienda Sanitaria Locale Napoli 1” (Medical committee, in 2003 protocol #291 and #113, and in 2007 protocol #556) and the ethics committee “Carlo Romano” University of Naples “Federico II” (Research committee, in 2013 protocol #231/13). The study was conducted according to the criteria set by the declaration of Helsinki and each subject signed an informed consent before participating to the study.

Blood samples were collected in the morning after the participants had been fasting for at least 12 h. Aliquots of plasma were immediately prepared and stored at − 80 °C and were subsequently used for the assessment of PlGF levels. PlGF levels (pg/ml) were measured using an electrochemiluminescence immunoassay on the Elecsys2010 analyzer (Roche Diagnostics, Mannheim, Germany).

Pregnant women were excluded from the study because of their high level of PlGF in the plasma17.

A logarithmic transformation was applied to the PlGF levels to normalize the trait distribution and the transformed trait was used in all subsequent statistical analyses.

The Mann–Whitney U test was used to compare median PlGF plasma levels among the samples.

GWAS and replication study

Genotyping in the discovery sample was performed with 370 K and OmniExpress Illumina chips, phasing and imputation were conducted separately by platform with the MaCH74 (http://csg.sph.umich.edu/abecasis/mach/index.html) and minimac75 (https://genome.sph.umich.edu/wiki/Minimac) software respectively, using 1000 Genomes Phase 1 v3 data as reference. Quality control filters applied before imputation were call rate > 95% for SNPs and samples and minor allele frequency (MAF) > 0.01. GWAS was carried out through a mixed model linear regression where the variance/covariance matrix is the genomic kinship to account for relatedness between individuals. Age and gender were used as covariates and an additive genetic model was considered. The analysis was performed with GenABEL R package76,77 (https://cran.r-project.org/src/contrib/Archive/GenABEL/) for genotyped SNPs and ProbABEL78 (https://github.com/GenABEL-Project/ProbABEL) for imputed data. SNPs with imputation quality (Rsq in MaCH) < 0.8 or MAF < 0.01 were excluded.

To select linkage disequilibrium (LD)-based independent association signals among the PlGF-associated SNPs from the discovery phase, we conducted the clumping procedure implemented in PLINK79 (https://zzz.bwh.harvard.edu/plink/) and picked the index SNPs with the most significant association p-value from each clumped association region based on the GWAS. The 1000 Genomes Phase 1 v3 genotypes were used as reference panel; the following thresholds for clumping were applied: association p-value < 1 × 10–6, physical distance > 1 Mb, and r2 < 0.01.

Independently associated SNPs in the discovery were de-novo genotyped in the 468 individuals of the replication sample using TaqMan SNP genotyping assays, following the manufacturer’s instructions (Bio-Rad, USA).

To assess evidence for replication, test-statistics of discovery and in silico replication samples were meta-analysed using a fixed effect model weighted by inverse variance, using Metal80 (http://csg.sph.umich.edu/abecasis/metal/index.html). SNPs were considered replicated if the effect was in the same direction between discovery and replication, and the p-value in the meta-analysis was lower than in the discovery sample.

The proportion of phenotypic variance explained by the PlGF-associated variants was estimated fitting 3 linear mixed effect models, in which PlGF levels were regressed, respectively, on: (1) no covariate; (2) gender and age; (3) gender, age and additive effect of each of the three SNPs. The variance explained by the associated variants was estimated using the gaston R package77,81 (https://CRAN.R-project.org/package=gaston) lmm.aireml function82 (https://search.r-project.org/CRAN/refmans/gaston/html/lmm.aireml.html), which uses the genomic kinship matrix to correct for relatedness between individuals.

Colocalization analysis

Colocalization analysis is a method used to identify shared regulatory variants between a GWAS and an eQTL analysis: if a GWAS trait and a gene expression analysis share the same associated SNP, it may suggest a regulatory role of the SNP mediated through the gene on the GWAS trait.

Colocalization analysis of the PlGF-associated loci with gene expression was conducted using the discovery GWAS results and the cis-eQTL results from 48 tissues in the GTEx Project (Version 7)20. We considered the three PlGF-associated loci on chromosomes 7, 10 and 13, and, for each of them, we identified all transcripts and all tissue transcript pairs with reported eQTLs within ± 500 kb of each GWAS index SNP. We used the colocalization method outlined by Giambartolomei et al.21, and implemented in the coloc.abf function from the R package77 coloc (https://rdrr.io/cran/coloc/man/coloc.abf.html), applying the default parameters. Evidence for colocalization was defined as an H4 ≥ 0.8, which represents the posterior probability that the association with PlGF and gene expression is due to the same underlying variant.

Analysis of the imputed genetically regulated gene expression (TWAS)

TWAS is a way of integrating expression data and genome-wide association studies, allowing the discovery of genes associated with traits of interest. TWAS analysis typically consists of two steps: first, a model is trained to predict gene expressions from local genetic variants near the focal genes, using a reference dataset containing both genotype and expression data; then, the pre-trained model is used to predict expressions from genotypes in the dataset under study, which contains genotypes and phenotypes. The predicted expressions are then associated with the phenotype of interest.

The genetically regulated gene expression was imputed using PrediXcan v.7 data36 (https://github.com/hakyimlab/PrediXcan). PrediXcan was used to impute the transcriptome of the 1600 individuals who were included in the discovery GWAS using the SNP weights derived from models trained on reference transcriptome datasets of the 48 tissues in the GTEx Project (Version 7)20, downloaded from PredictDB, and the genome-wide imputed variants showing MAF > 0.01 and RSQ > 0.80. The residuals of the regression of PlGF levels on sex, age and kinship matrix were used as phenotype. We tested for association 1866–8753 protein-coding genes (depending on the tissue) using a linear regression. An FDR < 0.05 was considered as threshold for statistical significance of associations.

Gene-based analysis

A gene-based association analysis was performed using VEGAS2 software22 (https://vegas2.qimrberghofer.edu.au/). VEGAS2 is an extension of the VErsatile Gene-based Association Study (VEGAS) approach83 that uses 1000 Genomes reference data to estimate LD between variants. All the variants from the discovery GWAS were used for the analysis. A ‘ − 20 kbloc’ parameter, which assigns all variants in the gene or within 20 kb on either side of a gene transcription site, was considered. All the variants assigned to a gene were used to compute the gene-based p-value. The 1000 Genomes European population was considered for the estimation of LD between variants. In total, 24,009 autosomal genes were analysed.

Pathway analysis

A pathway enrichment analysis was performed using the GSA-SNP2 software23 (https://sourceforge.net/projects/gsasnp2), considering the VEGAS2 gene-based association data as input for the analysis. 6425 Gene Ontology (GO) gene sets, with a pathway size between 10 and 200, were used as reference pathways. To reduce false positives, highly correlated adjacent genes (inter-gene correlations > 0.5) were removed using the European 1000 Genome genotype data as reference. Pathways were considered significantly enriched if they showed a q-value < 0.05. To summarize and reduce functional redundancies in the enrichment results, the REVIGO software24 (http://revigo.irb.hr/) was used. The 27 significantly enriched pathways were used as input. A semantic similarity threshold of 0.5 was considered to select non-redundant GO terms.