Genetics of PlGF plasma levels highlights a role of its receptors and supports the link between angiogenesis and immunity

Placental growth factor (PlGF) is a member of the vascular endothelial growth factor family and is involved in bone marrow-derived cell activation, endothelial stimulation and pathological angiogenesis. High levels of PlGF have been observed in several pathological conditions especially in cancer, cardiovascular, autoimmune and inflammatory diseases. Little is known about the genetics of circulating PlGF levels. Indeed, although the heritability of circulating PlGF levels is around 40%, no studies have assessed the relation between PlGF plasma levels and genetic variants at a genome-wide level. In the current study, PlGF plasma levels were measured in a population-based sample of 2085 adult individuals from three isolated populations of South Italy. A GWAS was performed in a discovery cohort (N = 1600), followed by a de novo replication (N = 468) from the same populations. The meta-analysis of the discovery and replication samples revealed one signal significantly associated with PlGF circulating levels. This signal was mapped to the PlGF co-receptor coding gene NRP1, indicating its important role in modulating the PlGF plasma levels. Two additional signals, at the PlGF receptor coding gene FLT1 and RAPGEF5 gene, were identified at a suggestive level. Pathway and TWAS analyses highlighted genes known to be involved in angiogenesis and immune response, supporting the link between these processes and PlGF regulation. Overall, these data improve our understanding of the genetic variation underlying circulating PlGF levels. This in turn could lead to new preventive and therapeutic strategies for a wide variety of PlGF-related pathologies.

. Characteristics of study participants from the discovery and replication cohorts. The best signal on chromosome 10, the rs17296631, is in complete LD with rs145141871, the second most associated variant in this locus, and both are reported as eQTLs for the NRP1 gene in the Brain Anterior cingulate cortex tissue. For both variants, the alleles that lead to higher expression levels of the NRP1 gene are also associated with an increase of PlGF levels. Despite this evidence, coloc did not reveal significant colocalization evidence, confirming that in such a situation of complete LD between two variants, the program shows uncertainty in identifying the causal one 21 . Transcription-wide and gene-based association analyses. To further identify loci associated with PlGF levels, we performed a transcription-wide association analysis (TWAS) and a gene-based analysis.
In our study, TWAS analysis identified several genes in different genomic regions whose expression in particular tissues, is associated with circulating levels of PlGF protein. The list of the 52 gene/tissue pairs, showing a significant association at FDR (< 0.05), is reported in the S1 Table. Interestingly, the majority of them are located on chromosome 6p22.1-p21. 33, in the HLA region. These results suggest that multiple variants, each likely with The blue dashed line indicates the suggestive association threshold (p-value < 1 × 10 -6 ), while the red dashed line indicates the genome-wide significant threshold (p-value < 5 × 10 -8 ). Blue dots are the SNPs associated at the suggestive significance level (p-value < 1 × 10 -6 ). For each locus, the nearest gene is reported. www.nature.com/scientificreports/ a marginal level of significance and located in the HLA genomic region, might contribute, in aggregate, to the regulation of the PlGF circulating levels. The top 10 genes from the VEGAS2 gene-based analysis 22 are reported in the S2 Table. Although none of the analysed genes showed a statistically significant False Discovery Rate (FDR)-adjusted p-value, the best association was found for the FLT1 gene, in accordance with the result obtained in the GWAS. Furthermore, except for the SMIM15-AS1 gene on chromosome 5, the other associated genes are all located in the HLA region on chromosome 6, in conformity with the results obtained in the TWAS analysis, and again suggesting an involvement of this locus on the regulation of the PlGF levels.

Pathway analysis.
To discover biological pathways involved in the modulation of the circulating levels of PlGF, we performed an enrichment analysis using the GSA-SNP2 program 23 . The analysis highlighted 27 Gene Ontology (GO) significantly enriched pathways (q-value < 0.05) (16 Biological Processes, 5 Cellular Components and 6 Molecular Functions), reported in Table 3. In line with the results obtained in the TWAS and the gene-based association analysis, the majority of the significantly enriched pathways were related to immune response. In fact, the GO terms summarization analysis performed by REVIGO program 24 evidenced that 15 out of the 16 Biological Processes were linked to the immunoglobulin mediated immune response, and that 2 Cellular Components and one Molecular Function were related to the MHC class II complex. Also, 3 GO terms were linked to the lumenal side of the endoplasmic reticulum membrane. Other significantly over-represented pathways were the N-glycan processing, the oxidoreductase activity, the mismatched DNA binding, the folic acid binding and the AU-rich element binding.

Discussion
Our study is the first GWAS of circulating PlGF levels. It was undertaken in 2068 individuals from three population isolates of the Cilento area, South Italy, and represents the largest survey of PlGF measurement in a population-based sample. In this study, we have identified 3 chromosomal regions (7p15.3, 10p11.22 and 13q12.3) harboring variants associated with PlGF plasma levels. The three top variants in these loci explain about 4% of PlGF phenotypic variance. www.nature.com/scientificreports/ The newly identified regions include many interesting and plausible candidate genes. The lead variant on chromosome 10, rs17296631, is located in an intergenic region upstream the NRP1 gene. This gene encodes for the Neuropilin-1, a membrane protein devoid of tyrosine kinase activity which acts as a co-receptor for PlGF. Recent studies suggest that NRP-1, through the binding of PlGF and other growth factors, can play an important role in the activation of specific signal transduction pathways also independently of other membrane receptors 5,25 . A positive correlation of expression of PlGF and NRP-1 has been observed in breast cancer 4 , and PlGF activation of NRP-1 could also promote tumor cell survival in a paracrine manner in a mouse model of medulloblastoma 25 . These observations are in line with the results of the eQTL analysis, in which the T allele of the rs17296631 variant is associated with an increase of both PlGF levels and NRP1 gene expression.
On chromosome 13, the rs9551465 is located in an intron of the FLT1 gene. This gene encodes for a tyrosineprotein kinase, Flt-1, belonging to the vascular endothelial growth factor receptor (VEGFR) family, which functions as a receptor for VEGFA, VEGFB and PlGF. The protein is also present in a soluble form, sFlt-1, lacking of the transmembrane and intracellular domains. The PlGF binding to Flt-1 stimulates angiogenesis via both direct and indirect mechanisms: the activation of Flt-1 by PlGF results in phosphorylation of specific tyrosine residues in Flt-1 and downstream signaling different from those activated by VEGFA binding; also, it has been proposed, based on in vitro data and overexpression studies, that the binding of PlGF to Flt-1 induces pro-angiogenic effects as PlGF shifts VEGFA from Flt-1 to VEGFR2, enhancing the effects of VEGFA 26 . In the present study, we have found that the PlGF associated rs9551465 variant in the FLT1 gene is also an eQTL for FLT1: in particular, the variant allele (T) is associated with higher PlGF protein levels and with a lower FLT1 gene expression. This is consistent with the observation that a higher expression of Flt-1, acting as a "decoy" for PlGF, can determine lower detectable circulating levels of the ligand protein 14 .
The third associated variant, rs77619310, is located on chromosome 7p15.3 in an intron of the RAPGEF5 gene. This gene, encoding for a guanine nucleotide exchange factor (GEF) for the GTPases Rap1, Rap2 and M-RAS, serves as RAS activator by promoting the acquisition of GTP to maintain the active GTP-bound state and is a key link between cell surface receptors and RAS activation 27,28 . Interestingly, Rap1, activated by different GEFs, including RAPGEF5, acts as a regulator of several basic cellular functions such as adhesion, polarity, differentiation and growth 29,30 . In the endothelium, Rap1 is a key positive regulator of angiogenic process 31 . It functions downstream of the main angiogenic growth-controlling receptors in endothelial cells, including VEGFA 32 . Also, Rap1 promotes VEGF-mediated angiogenesis through the activation of VEGFR2 33 and is capable of regulating endothelial barrier permeability by VEGF stimulation 34 .
In a previous candidate gene study, we reported the association of the PlGF gene locus with the levels of the protein in plasma 17 . More recently, another study has found an association between an additional polymorphism in the PlGF gene and the protein plasma levels 35 . In the current study, no variants in the PlGF gene region reached the statistical significance, however, some SNPs located between 5 and 20 kb upstream of the gene were nominally associated with PlGF levels (rs4903273, p-value = 2.04 × 10 -3 ) and showed a moderate LD (r 2 = 0.56) with the rs2268614 SNP described in our previous work 17 .
An implication of HLA genes in the regulation of PlGF levels was detected at statistical significance in the TWAS and also suggested by a gene-based association analysis, confirming an increased power of TWAS analysis which takes advantage from gene expression level data 36 . In particular, such genes belong to the HLA class III and are involved in both the immune system function and the angiogenic process. In detail, gene expression levels of LY6G5B and LY6G5C, belonging to the cluster of leukocyte antigen-6 (LY6) genes, showed an association with PlGF circulating levels in multiple tissues. Little is known about the proteins coded by these two genes, however other members of the LY6 protein family, expressed in various types of tissues and at different stages of cell differentiation, are involved in cell proliferation, cell migration, cell-cell interactions, immune cell maturation, macrophage activation, and cytokine production 37,38 , and their overexpression or dysregulation is associated with tumorigenesis and autoimmune diseases 39 . Also, the C4A gene, encoding for the acidic form of the complement factor 4, plays a pivotal role in the activation of immune defenses and the clearance of immune complexes or apoptotic debris in vitro and in vivo studies 40,41 . A lower number of gene copies of C4A has been linked to an increased risk for different autoimmune diseases, such as Systemic Lupus Erythematosus, Type 1 Diabetes and Juvenile Dermatomyositis [42][43][44] . In addition, the deficiency of the C4A gene has also been associated with preeclampsia, a well-established PlGF related disease [14][15][16]45,46 , with a lower gene copy number associated with an increase of disease severity, supporting the importance of the classical pathway of the complement system in this pathology 47 . Also, higher levels of C4A have been observed in patients with neovascular age-related macular degeneration 48,49 , an ocular pathology also characterized by an increase of PlGF levels 11 . Finally, some evidences link the pseudogene MICD to the immune system: in fact, SNPs in this gene region have been associated with eosinophil, basophil and granulocyte count 50 and with different autoimmune disease, such as vitiligo 51,52 , Graves' disease 53 and psoriatic arthritis 54 .
Other genes in the same region, the Dimethylarginine dimethylaminohydrolase-2 (DDAH2) and the TCF19, are implicated in the angiogenic process. The DDAH2 gene encodes for an enzyme that positively regulates the nitric oxide (NO) generation by metabolizing the asymmetric dimethylarginines (ADMA), which are inhibitors of the nitric oxide synthase (NOS) activity 55 . DDAH2 acts as a key regulator of the angiogenesis: in fact, its overexpression enhances the proliferation and migration of endothelial cells through the induction of expression and secretion of VEGF, both regulating the production of endothelial NO through the stimulation of endothelial NOS (eNOS) activity 56 , and in an eNOS-independent manner, via the activation of Sp1 binding site of the VEGF promoter 57 . DDAH2 is considered an antiatherosclerotic molecule: in fact, hypermethylation of DDAH2 promoter, accompanied by its reduced expression, correlates with endothelial dysfunction in patients affected by Coronary Artery Disease 58 . The inhibition of DDAH2 has also been correlated with an attenuation of aberrant angiogenesis and with the improvement of vascular regeneration in mice models of oxygen-induced retinopathy  59 , and similar effects on the retinal vasculature have also been observed in PlGF deficient OIR mouse models 1 . Also, decreased expression levels of DDAH2 have been observed in women affected by preeclampsia 60,61 .
The TCF19 gene encodes a transcription factor containing a PHD-type zinc finger domain and plays a role in the proliferation and apoptosis of pancreatic beta cells 62 . SNPs in this gene have been associated with chronic Hepatitis B, Type 1 and Type 2 Diabetes 63-65 . Recently its overexpression has been linked to an increase of cell proliferation in different types of cancer 66,67 .
It is of interest to note that, although colocalization and TWAS analyses both use GTEx expression data, the PlGF associated genes, which colocalize with eQTL signals, were not detected by TWAS analysis. From this observation, we noted that in the PredictDB database (http:// predi ctdb. org/) the prediction models used for the imputation of the expression levels are present, depending on the tissue, for a limited and variable number of genes. Also, imputation is performed using a less dense panel of variants compared to that used in the GWAS. Both these circumstances might explain the possibility of bypassing some association signals in the TWAS. In our analysis, NRP1 and FLT1 imputed genes expression levels are missing for those tissues (Brain Anterior cingulate cortex and Thyroid, respectively) in which the colocalization was identified. For the RAPGEF5 gene, for which a very high posterior probability of colocalization was evidenced in the Artery Tibial tissue, the TWAS failed to detect an association (p-value = 0.017), probably because imputed expression levels were obtained using a reduced number of variants (N = 27), not including any of the PlGF associated variants detected in the GWAS at this locus.
The pathway analysis also revealed an over-representation of several pathways linked to the immune system. To date, only a few studies have linked PlGF to the adaptive immune response. Lin et al. reported the evidence of immunosuppressive properties of PlGF. They found that PlGF inhibited the activation and maturation of human dendritic cells, differentiated from CD14 + monocytes, through the NF-κB signaling pathway. PlGFtreated dendritic cells resulted in the downregulation of maturation markers CD80, CD86, CD83, CD40, and MHC-II expression as well as the inhibition of IL-12, IL-8, and TNF-α production in response to Lipopolysaccharide stimulation, in respect to untreated dendritic cells. In addition, treatment of dendritic cells with PlGF resulted in the suppression of naïve CD4 + T cell proliferation in an allogeneic mixed lymphocyte reaction. The results from this study indicate that PlGF can downregulate type 1 T helper immune responses by modulating the function of dendritic cells 68 .
Han et al. reported the effect of PlGF on the regulatory B cells (Bregs), a subset of B lymphocytes with a pivotal role in regulating immune responses involved in inflammation, autoimmunity and cancer. Using surgically removed glioma tissues, they demonstrated that glioma cells release exosomes carrying PlGF. When purified naïve B cells captured the PlGF-containing exosomes from glioma cells, they differentiated into TGF-β + Bregs able to suppress the CD8 + T cell activities. Further, the treatment of glioma cells with an anti-PlGF antibody (TB-403), a (PlGF)-specific inhibitor, completely suppressed the expression of the TGF-β, while the exposure of glioma cells to PlGF, upregulated the expression of the TGF-β 69 .
A recent work has demonstrated that PlGF is selectively secreted by the helper T cells (T H 17), a subset of inflammatory T cells that, producing IL-17, contribute to autoimmunity and tissue damage 70 and which dysregulation is associated with various autoimmune diseases, including multiple sclerosis and rheumatoid arthritis 71,72 . In the same article, the authors demonstrate that T cell-produced PlGF is functionally active in promoting angiogenesis and that PlGF stimulates T H 17 cell differentiation by activating STAT3 via binding to the Flt1 and NRP1 receptors. Also, they show that the overexpression of PlGF in T cells exacerbates disease in mice with collagen-induced arthritis and that PlGF concentrations also correlated with IL-17 concentrations in synovial fluid from patients with rheumatoid arthritis. Overall, they provide an insight into the links between angiogenesis, T H 17 cell development, inflammation and autoimmunity, emphasizing the importance of PlGF in these processes 70 . Previously, Kang et al., using a PlGF-overexpressing transgenic mice model, showed that PlGF secretion was upregulated in isolated T-cells, suggesting PlGF as a regulator of T-cell differentiation. In addition, the authors evidenced that the CD4 + T-cells isolated from the spleen of transgenic mice indicated greater inflammatory T H 1 and T H 17 helper T-cell differentiation, thereby emphasizing the role of PlGF in T-cell differentiation and development 73 .
In conclusion, in this study we have identified some genes and pathways, known to be implicated in the angiogenesis process and the immune response, that are associated with the variation of circulating PlGF. The identification of those genes corroborates the link between PlGF protein levels and immune system function and could lead to new preventive and therapeutic strategies in immune and/or angiogenesis-related diseases in which PlGF has been implicated.

Methods
Population samples and PlGF measurement. The discovery sample includes 1600 volunteer individuals recruited through a population-based sampling strategy in three small isolated villages of the Cilento region, South Italy (Campora, Gioi and Cardile). A de novo replication was performed in additional 468 subjects from the same villages 18,19 . A subset of this sample (N = 871) was used in our previous study on the genetics of PlGF plasma levels 17 .
The Cilento study was approved by the ethics committee of "Azienda Sanitaria Locale Napoli 1" (Medical committee, in 2003 protocol #291 and #113, and in 2007 protocol #556) and the ethics committee "Carlo Romano" University of Naples "Federico II" (Research committee, in 2013 protocol #231/13). The study was conducted according to the criteria set by the declaration of Helsinki and each subject signed an informed consent before participating to the study.
Blood samples were collected in the morning after the participants had been fasting for at least 12 h. Aliquots of plasma were immediately prepared and stored at − 80 °C and were subsequently used for the assessment of www.nature.com/scientificreports/ PlGF levels. PlGF levels (pg/ml) were measured using an electrochemiluminescence immunoassay on the Elec-sys2010 analyzer (Roche Diagnostics, Mannheim, Germany). Pregnant women were excluded from the study because of their high level of PlGF in the plasma 17 .
A logarithmic transformation was applied to the PlGF levels to normalize the trait distribution and the transformed trait was used in all subsequent statistical analyses.
The Mann-Whitney U test was used to compare median PlGF plasma levels among the samples.

GWAS and replication study.
Genotyping in the discovery sample was performed with 370 K and Omni-Express Illumina chips, phasing and imputation were conducted separately by platform with the MaCH 74 (http:// csg. sph. umich. edu/ abeca sis/ mach/ index. html) and minimac 75 (https:// genome. sph. umich. edu/ wiki/ Minim ac) software respectively, using 1000 Genomes Phase 1 v3 data as reference. Quality control filters applied before imputation were call rate > 95% for SNPs and samples and minor allele frequency (MAF) > 0.01. GWAS was carried out through a mixed model linear regression where the variance/covariance matrix is the genomic kinship to account for relatedness between individuals. Age and gender were used as covariates and an additive genetic model was considered. The analysis was performed with GenABEL R package 76,77 (https:// cran.r-proje ct. org/ src/ contr ib/ Archi ve/ GenAB EL/) for genotyped SNPs and ProbABEL 78 (https:// github. com/ GenAB EL-Proje ct/ ProbA BEL) for imputed data. SNPs with imputation quality (Rsq in MaCH) < 0.8 or MAF < 0.01 were excluded.
To select linkage disequilibrium (LD)-based independent association signals among the PlGF-associated SNPs from the discovery phase, we conducted the clumping procedure implemented in PLINK 79 (https:// zzz. bwh. harva rd. edu/ plink/) and picked the index SNPs with the most significant association p-value from each clumped association region based on the GWAS. The 1000 Genomes Phase 1 v3 genotypes were used as reference panel; the following thresholds for clumping were applied: association p-value < 1 × 10 -6 , physical distance > 1 Mb, and r 2 < 0.01.
Independently associated SNPs in the discovery were de-novo genotyped in the 468 individuals of the replication sample using TaqMan SNP genotyping assays, following the manufacturer's instructions (Bio-Rad, USA).
To assess evidence for replication, test-statistics of discovery and in silico replication samples were metaanalysed using a fixed effect model weighted by inverse variance, using Metal 80 (http:// csg. sph. umich. edu/ abeca sis/ metal/ index. html). SNPs were considered replicated if the effect was in the same direction between discovery and replication, and the p-value in the meta-analysis was lower than in the discovery sample.
The proportion of phenotypic variance explained by the PlGF-associated variants was estimated fitting 3 linear mixed effect models, in which PlGF levels were regressed, respectively, on: (1) no covariate; (2) gender and age; (3) gender, age and additive effect of each of the three SNPs. The variance explained by the associated variants was estimated using the gaston R package 77,81 (https:// CRAN.R-proje ct. org/ packa ge= gaston) lmm.aireml function 82 (https:// search. r-proje ct. org/ CRAN/ refma ns/ gaston/ html/ lmm. aireml. html), which uses the genomic kinship matrix to correct for relatedness between individuals.
Colocalization analysis. Colocalization analysis is a method used to identify shared regulatory variants between a GWAS and an eQTL analysis: if a GWAS trait and a gene expression analysis share the same associated SNP, it may suggest a regulatory role of the SNP mediated through the gene on the GWAS trait.
Colocalization analysis of the PlGF-associated loci with gene expression was conducted using the discovery GWAS results and the cis-eQTL results from 48 tissues in the GTEx Project (Version 7) 20 . We considered the three PlGF-associated loci on chromosomes 7, 10 and 13, and, for each of them, we identified all transcripts and all tissue transcript pairs with reported eQTLs within ± 500 kb of each GWAS index SNP. We used the colocalization method outlined by Giambartolomei et al. 21 , and implemented in the coloc.abf function from the R package 77 coloc (https:// rdrr. io/ cran/ coloc/ man/ coloc. abf. html), applying the default parameters. Evidence for colocalization was defined as an H4 ≥ 0.8, which represents the posterior probability that the association with PlGF and gene expression is due to the same underlying variant.

Analysis of the imputed genetically regulated gene expression (TWAS). TWAS is a way of inte-
grating expression data and genome-wide association studies, allowing the discovery of genes associated with traits of interest. TWAS analysis typically consists of two steps: first, a model is trained to predict gene expressions from local genetic variants near the focal genes, using a reference dataset containing both genotype and expression data; then, the pre-trained model is used to predict expressions from genotypes in the dataset under study, which contains genotypes and phenotypes. The predicted expressions are then associated with the phenotype of interest.
The genetically regulated gene expression was imputed using PrediXcan v.7 data 36 (https:// github. com/ hakyi mlab/ Predi Xcan). PrediXcan was used to impute the transcriptome of the 1600 individuals who were included in the discovery GWAS using the SNP weights derived from models trained on reference transcriptome datasets of the 48 tissues in the GTEx Project (Version 7) 20 , downloaded from PredictDB, and the genome-wide imputed variants showing MAF > 0.01 and RSQ > 0.80. The residuals of the regression of PlGF levels on sex, age and kinship matrix were used as phenotype. We tested for association 1866-8753 protein-coding genes (depending on the tissue) using a linear regression. An FDR < 0.05 was considered as threshold for statistical significance of associations.
Gene-based analysis. A gene-based association analysis was performed using VEGAS2 software 22 (https:// vegas2. qimrb ergho fer. edu. au/). VEGAS2 is an extension of the VErsatile Gene-based Association Study (VEGAS) approach 83 that uses 1000 Genomes reference data to estimate LD between variants. All the variants from the discovery GWAS were used for the analysis. A ' − 20 kbloc' parameter, which assigns all variants in the