Novel causative genes for heritable pulmonary arterial hypertension

Pulmonary arterial hypertension (PAH) is a rare disorder with a poor prognosis. Deleterious variation within genes encoding components of the transforming growth factor-ß pathway, particularly the bone morphogenetic protein type 2 receptor (BMPR2), underlie the majority of heritable forms of PAH. Since the missing genetic contribution likely involves mutations in genes confined to small numbers of cases, we performed whole genome sequencing in 1038 PAH index cases and 6385 subjects with other rare diseases. Case-control analyses revealed significant overrepresentation of rare variants in novel genes, namely ATP13A3, AQP1 and SOX17, and provided independent validation of a critical role for GDF2 in PAH. Mutations in GDF2, encoding a ligand for BMPR2, led to reduced secretion from transfected cells. In addition, we confirmed the presence of mutations in most, but not all, previously reported PAH genes, and provide evidence for further putative genes. Taken together these findings provide new insights into the molecular basis of PAH and indicate unexplored pathways for therapeutic intervention.


Introduction
Idiopathic and heritable pulmonary arterial hypertension (PAH) are rare disorders characterised by occlusion of arterioles in the lung 1 , leading to marked increases in pulmonary vascular resistance 2 . Life expectancy from diagnosis averages 3-5 years 3 , with death ensuing from failure of the right ventricle.
Mutations in the gene encoding the bone morphogenetic protein type 2 receptor (BMPR2), a receptor for the transforming growth factor-beta (TGF-ß) superfamily 4, 5 account for over 80% of families with PAH, and approximately 20% of sporadic cases 6 . Mutations have been identified in genes encoding other components of the TGF-ß/bone morphogenetic protein (BMP) signalling pathways, including ACVRL1 7 and ENG 8 . On endothelial cells, BMPR2 and ACVRL1 form a signaling complex, utilizing ENG as a co-receptor. Case reports of rare sequence variation in the BMP signalling intermediaries, SMAD1, SMAD4 and SMAD9 9, 10 , provide compelling evidence for a central role of dysregulated BMP signalling in PAH pathogenesis.
Analysis of coding variation in BMPR2-negative kindreds revealed heterozygous mutations in genes not directly impacting on the TGF-ß/BMP pathway, including CAV1 11 , and the potassium channel, KCNK3 12 . Deletions and loss of function mutations in TBX4, an essential regulator of embryonic development, were identified in childhood onset PAH 13 . A clinically and pathologically distinct form of PAH, known as pulmonary veno-occlusive disease or pulmonary capillary haemangiomatosis (PVOD/PCH), was shown recently to be caused by biallelic recessive mutations in EIF2AK4 14, 15 , a kinase in the integrated stress response.
The purpose of the present study was to identify additional rare sequence variation contributing to the genetic architecture of PAH, and to assess the relative contribution of rare variants in genes implicated in prior studies. A major finding is that rare causal heterozygous variants in several previously unidentified genes (ATP13A3, AQP1 and SOX17) were significantly overrepresented in the PAH cohort, and we provide independent validation for GDF2 as a causal gene.

Description of the PAH cohort
In total, 1048 PAH cases (1038 index cases and 10 related affected individuals) were recruited for WGS. Of these, 908 (86.7%) were diagnosed with idiopathic PAH, 58 (5.5%) gave a family history of PAH and 60 (5.7%) gave a history of drug exposure associated with PAH 16 . Twenty two cases (2.1%) held a clinical diagnosis of PVOD/PCH (Figure 1a). Demographic and clinical characteristics of the PAH cohort are provided in Table 1.

Pathogenic variants in previously reported PAH disease genes
Our filtering strategy detected rare deleterious variation in previously reported PAH genes in 19.9% of the PAH cohort. For BMPR2, rare heterozygous mutations were identified in 160 of 1048 cases (15.3%). The frequency of BMPR2 mutations in familial PAH was 75.9%, in sporadic cases 12.2%, and 8.3% in anorexigen-exposed PAH cases. Forty-eight percent of BMPR2 mutations were reported previously 17 , and the remainder were newly identified in this study. Fourteen percent of BMPR2 mutations resulted in the deletion of larger proteincoding regions ranging from 5 kb to 3.8 Mb in size. Table S1 provides the breakdown of BMPR2 SNVs and indels, and the larger deletions are shown in Figure 2a-c with a detailed summary in Table S2.
In a case-control comparison of the frequencies of deleterious variants confined to the previously reported PAH genes, we observed significant overrepresentation of rare variants in BMPR2, TBX4, ACVRL1 and biallelic variants in EIF2AK4 only (P < 0.05) ( Table S4).

Identification of novel PAH disease genes
The strategy to identify novel causative genes in PAH employed a series of case-control analyses (Figure 1b). To identify signals that might be masked by variants in previously reported PAH genes, we excluded subjects with rare variants and deletions in BMPR2, EIF2AK4, ENG, ACVRL1, TBX4, SMAD9 and KCNK3. A genome-wide comparison of protein truncating variants (PTVs), representative of high impact variants, identified a higher frequency of PTVs in ATP13A3 (6 cases) (P adj = 0.0346). Moreover, we identified additional PTVs in several putative PAH genes, including EVI5 (5 cases, 1 control) and KDR (4 cases, 0 controls) (Figure 3a), that require further validation to evaluate their contribution to PAH pathogenesis (Table S5).
We next analysed rare missense variants overrepresented in the PAH cohort, again excluding subjects with variants in the previously validated PAH genes. This revealed significant overrepresentation of rare variants in GDF2 after correction for multiple testing (P adj = 0.0023), followed by AQP1 (Figure 3b and Table S6). Next, in a combined analysis of rare missense variants and PTV, only GDF2 remained significant (P = 0.001). Rare variants in additional putative genes occurred at higher frequency in cases compared to controls, including AQP1, ALPPL2, ATP13A3, OR8U1, IFT74, FLNA, SOX17, ATP13A5, C3orf20 and PIWIL1 (uncorrected P value < 0.0005), but were not significant after correction for multiple testing (Figure 3c and Table S7).
In order to increase power to detect rare associations, we deployed SKAT-O on filtered rare PTVs and missense variants. Excluding previously reported genes, this analysis revealed an association with rare variants in AQP1 (P adj = 4.28x10 -6 ) and SOX17 (P adj = 6.7x10 -5 ) ( Figure  3d). AQP1 and SOX17 were both also nominally significant in the combined burden tests, described above. Association was also found with rare variants in MFRP (P adj = 1.3x10 -5 ). However, we consider MFRP a false-positive finding for reasons given in the Discussion. Table S8 shows the top 50 most significant genes identified by SKAT-O, providing further candidates to be evaluated in future studies. Details of rare variants in novel PAH genes (GDF2, ATP13A3, AQP1, SOX17) identified in cases are provided in Table S9.
Notably, a genome-wide assessment of larger structural variation did not identify any additional large deletions after exclusion of subjects harbouring deletions in BMPR2 ( Figure  2d-e).
The proportion of PAH cases with mutations in the new genes was 3.5%. The clinical characteristics of PAH cases with mutations in these genes are provided in Table 2b. Of note, cases with mutations in SOX17 and AQP1 were significantly younger at diagnosis (32.8 ± 16.2 years [P = 0.002] and 36.9 ± 14.3 years [P = 0.013], respectively) compared to cases with no mutations in the previously established genes (51.7 ± 16.6 years).

Predicted functional impact of variants in novel PAH genes
To evaluate the potential functional impact of rare variants identified in the likely causative genes we performed structural analysis of GDF2, ATP13A3, AQP1, and SOX17. In addition we undertook a functional analysis of the GDF2 variants identified, and examined mRNA expression levels of the other genes, and the impact of knockdown of ATP13A3, in relevant primary cell types.
Heterozygous mutations in GDF2 exclusive to PAH cases comprised 1 frameshift variant and 7 missense variants. GDF2 encodes growth and differentiation factor 2, also known as bone morphogenetic protein 9 (BMP9), the major circulating ligand for the endothelial BMPR2/ACVRL1 receptor complex 18 . Amino acid substitutions were assessed against the published crystal structure 19 of the prodomain bound form of GDF2 ( Figure 4). Variants clustered at the interface between the prodomain and growth factor domain. Since the prodomain is important for the processing of GDF2, it is likely that amino acid substitutions reduce the stability of the prodomain-growth factor interface. In keeping with these predictions, HEK293T cells transfected with GDF2 variants exclusive to PAH cases, demonstrated reduced secretion of mature GDF2 into the cell supernatants (Figure 4d), compared with wild type GDF2.
We identified 3 heterozygous frameshift variants, 2 stop gained, 2 splice region variants in ATP13A3, which are predicted to lead to loss of ATPase catalytic activity (Figure 5a). In addition, we identified 4 heterozygous likely pathogenic missense variants in PAH cases, two near the conserved ATPase catalytic site and predicted to destabilise the conformation of the catalytic domain (Figure 5b-d). The distribution of variants ( Figure 5a) suggests that these mutations impact critically on the function of the protein.
The majority of rare variants identified in AQP1, which encodes aquaporin 1, are situated within the critical water channel ( Figure 6). In particular the p.Arg195Trp variant, identified in 5 PAH cases, locates at the hydrophilic face of the pore. This arginine at position 195 helps define the constriction region of the AQP1 pore structure and is conserved across the water specific aquaporins 23 . Rare variants in SOX17 ( Figure S1), included 3 nonsense variants predicted to lead to loss of the beta-catenin binding region, and 6 missense variants predicted to disrupt interactions with Oct4 and beta-catenin 24,25 .
GDF2 is known to be secreted from the liver, but the cellular localization of proteins encoded by the other novel genes is less well characterised. Thus we employed immunohistochemistry to examine localization in the normal and hypertensive human pulmonary vasculature. Figure 7 shows that AQP1, ATP13A3 and SOX17 are predominantly localised to the pulmonary endothelium in normal human lung and to endothelial cells within plexiform lesions of patients with idiopathic PAH. In addition, we determined the relative mRNA expression levels of AQP1, ATP13A3 and SOX17 in primary cultures of pulmonary artery smooth muscle cells (PASMCs), pulmonary artery endothelial cells (PAECs) and blood outgrowth endothelial cells (BOECs) 26 . AQP1 was expressed in PASMCs and endothelial cells, with a trend towards higher levels in PASMCs (Figure 8a). ATP13A3 was highly expressed in both cell types (Figure 8b), whereas SOX17 was almost exclusively expressed in endothelial cells (Figure 8c). Although AQP1 and SOX17 are known to play roles in endothelial function, the function of ATP13A3 in vascular cells is entirely unknown. Thus, we determined the impact of ATP13A3 knockdown on proliferation and apoptosis of BOECs. Loss of ATP13A3 led to marked inhibition of serum-stimulated proliferation of BOECs, and increased apoptosis in serum-deprived conditions (Figure 8d-f).

Discussion
We report the most comprehensive analysis to date of rare genetic variation in a large cohort of index cases with idiopathic and heritable forms of PAH. Whilst we utilised WGS, the main goal was the identification of novel rare causal variation in the protein coding sequence underlying PAH. The approach involved a rigorous case-control comparison using a tiered search for variants. First, we searched for high impact PTVs overrepresented in cases, having excluded previously established PAH genes. This revealed PTVs in ATP13A3, a poorly characterised P-type ATPase of the P5 subfamily 27 . There is little information regarding the function of the ATPase, ATP13A3, which appears widely expressed in mouse tissues 27 . Although, the precise substrate specificity is unknown, ATP13A3 plays a role in polyamine transport(PMID:27429841). Based on available RNA sequencing data, ATP13A3 is highly expressed in human pulmonary vascular cells and cardiac tissue (https://www.encodeproject.org). We confirmed that ATP13A3 mRNA is expressed in primary cultured pulmonary artery smooth muscle cells and endothelial cells, and provide preliminary data that loss of ATP13A3 inhibits proliferation and increases apoptosis of endothelial cells. These findings are consistent with the widely accepted paradigm that endothelial apoptosis is a major trigger for the initiation of PAH 28, 29 . It will be of considerable interest to determine the role of ATP13A3 in vascular cells and whether it is functionally associated with BMP signalling, or represents a distinct therapeutic target in PAH.
Analysis of missense variation, and a combined analysis of all predicted deleterious variation, revealed that mutation at the GDF2 gene is also significant determinant of predisposition to PAH. Of the new genes identified, GDF2 provides further striking evidence for the central role of the BMP signalling pathway in PAH. GDF2 encodes the major circulating ligand for the endothelial BMPR2/ACVRL1 receptor complex 18 . Taken together, the genetic findings strongly suggest that a deficiency in GDF2/BMPR2/ACVRL1 signalling in pulmonary artery endothelial cells is critical in PAH pathobiology. The majority of GDF2 variants detected in our adult-onset PAH cohort were heterozygous missense variants, in contrast to a previous case report of childhood onset PAH due to a homozygous nonsense mutation 30 . The finding of causal GDF2 variants in PAH cases, associated with reduced production of GDF2 from cells, provides further support for investigating replacement of this factor as a therapeutic strategy in PAH 31 .
To maximise the assessment of rare variation in a case-control study design, we deployed the SKAT-O test. This approach revealed a significant association of rare variation in the aquaporin gene, AQP1, and the transcription factor encoded by SOX17. Of note, both AQP1 and SOX17 were within the top 8 ranked genes in our combined PTV and missense burden test analysis (Table S7), providing further confidence in their causative contribution to PAH.
Aquaporin 1 belongs to a family of membrane proteins that function as channels facilitating water transport in response to osmotic gradients 23 , and AQP1 is known to promote endothelial cell migration and angiogenesis 32 . Thus, approaches that maintain or restore pulmonary endothelial function could offer new therapeutic directions in PAH. Conversely, AQP1 inhibition in pulmonary artery smooth muscle cells ameliorated hypoxia-induced pulmonary hypertension in mice 33 , suggesting that further studies are required to determine the key cell type impacted by AQP1 mutations in human PAH.
Although functional studies are required to confirm the mechanisms by which mutations in SOX17 cause PAH, this finding provides additional support for the vascular endothelium as the major initiating cell type in this disorder. SOX17 encodes the SRY-box containing transcription factor 17, which plays a fundamental role in angiogenesis 34 and arteriovenous differentiation 35 . Moreover, conditional deletion of SOX17 in mesenchymal progenitors leads to impaired formation of lung microvessels 36 .
Whilst the SKAT-O analysis also provided support for the MFRP gene, recessive bi-allelic mutations in MFRP cause retinal degeneration and posterior microphthalmos 37 . The expression of MFRP transcripts is largely confined to the central nervous system 38 and the majority of variants were present in the Genome Aggregation Database (GnomAD, http://gnomad.broadinstitute.org). On the basis of these considerations, variants in MFRP are unlikely to contribute to PAH aetiology.
This analysis provides new insights on the frequency and validity of previously reported genes in PAH. We confirmed that mutations in BMPR2 are the most common genetic cause and validated rare causal variants in ACVRL1, ENG, SMAD9, TBX4, KCNK3 and EIF2AK4. Although our findings question the validity of CAV1, SMAD1 and SMAD4 as causal genes, previous reports might represent private mutations occurring in very rare families. The use of WGS in this study allowed closer interrogation of larger deletions around the BMPR2 locus than has been possible previously. Nevertheless, additional analyses are required to determine the full impact of structural variation (inversions, duplications, smaller deletions) at this and other loci.
The non-PAH cohort used in the case-control comparisons for this study comprised individuals, or relatives of individuals, with other rare diseases recruited to the NIHR Bioresource for Rare Diseases (NIHR BR-RD) in the UK (see Methods). In general, for very rare causal variants, the comparison between PAH cases and non-PAH rare disease controls should not reduce our ability to detect overrepresentation of rare variants in a particular gene in the PAH cohort, if mutations in that gene are specific to PAH. However, if rare variants in a gene were responsible for more than one phenotype, it is possible that this would reduce the power to detect overrepresentation in the PAH cohort. For example, if mutations occurred in different functional domains of the expressed protein, this might lead to PAH if mutations affected one domain, but other phenotypes if they affected another domain. Overcoming this potential limitation will require additional analysis of the functional impact of variants and their distribution within a gene, and more detailed information on the phenotypes of subjects in the non-PAH group.
Taken together, this study identifies rare sequence variation in new genes underlying heritable forms of PAH, and provides a unique resource for future large-scale discovery efforts in this disorder. Mutations in previously established genes accounted for 19.9% of PAH cases. Including new genes identified in this study (GDF2, ATP13A3, AQP1, SOX17), the total proportion of cases explained by mutations increased to 23.5%. It is likely that independent confirmation of the putative genes identified in this study will increase further the proportion of cases explained by mutations, but this will require larger international collaborations. The results suggest that the genetic architecture of PAH, beyond mutations in BMPR2, is characterised by substantial genetic heterogeneity and consists of rare heterozygous coding region mutations shared by small numbers of cases. The contribution of rare variation within non-coding regulatory regions to PAH aetiology remains to be determined. Our findings provide support for a central role of the pulmonary vascular endothelium in disease pathogenesis, and suggest new mechanisms that could be exploited therapeutically in this life-limiting disease. multiple primary tumours (7.8%), neuropathic pain disorder (2.6%), primary immune disorders (15.3%), primary membranoproliferative glomerulonephritis (2.3%), retinal dystrophies/paediatric neurology and metabolic disease (19.8%), stem cell and myeloid disorders (2.1%), steroid resistant nephrotic syndrome (3.6%), and others (0.3%), or their first degree relatives.
High-throughput sequencing DNA extracted from venous blood underwent whole genome sequencing using the Illumina TruSeq DNA PCR-Free Sample Preparation kit (Illumina Inc., San Diego, CA, USA) and Illumina HiSeq 2000 or HiSeq X sequencer, generating 100 -150 bp reads with a minimum coverage of 15X for ~95% of the genome (mean coverage of 35X).
Data pre-processing and generation of analysis-ready data sets Sequencing reads were pre-processed by Illumina with Isaac Aligner and Variant Caller (v2, Illumina Inc.) using human genome assembly GRCh37 as reference. Variants were normalised, merged into multi-sample VCF filed by chromosome using the gVCF aggregation tool agg (https://github.com/Illumina/agg) and annotated with Ensembl's Variant Effect Predictor (VEP). Annotations included minor allele frequencies from other control data sets (i.e. ExAC, 1000 Genomes Project and UK10K) as well as deleteriousness and conservation scores (i.e. CADD, SIFT, PolyPhen-2 and Gerp) enabling further filtering and assessment of the likely pathogenicity of variants 39,40,41,42,43,44,45 . To take forward only high quality calls, the pass frequency (proportion of samples containing alternate alleles that passed the original variant filtering) and call rate (proportion of samples with reference or alternate genotypes) were combined into the overall pass rate (OPR: pass frequency x call rate) and variants with an OPR of 80% or higher were retained.

Estimation of ethnicity and relatedness
We estimated the population structure and relatedness based on a representative set of SNPs using the R package GENESIS to perform PC-Air 46 and PC-Relate 47 , respectively. The selected 35,114 autosomal SNPs were present on Illumina genotyping arrays (HumanCoreExome-12v1.1, HumanCoreExome-24v1.0, HumanOmni2.5-8v1.1), do not overlap quality control excluded regions or multiallelic sites in the 1000 Genomes (1000G) Phase 3 dataset 40 , do not have any missing genotypes in NIHR BR-RD, had a MAF of 0.3 or above and LD pruning was performed using PLINK 48 with a window size of 50 bp, window shift of 5 bp and a variance inflation factor threshold of 2. The 2,110 samples from the 1000G Project including the European (EUR), African (AFR), South Asian (SAS) and East Asian (EAS) populations (excluding the admixed American population) were filtered for the selected SNPs and the filtered data were used to perform a principal component analysis (PCA) using PC-Air. We modelled the scores of the leading five principal components as data generated by a population specific multivariate Gaussian distribution and estimated the corresponding mean and covariance parameters. Genotypes from the NIHR BR-RD samples were projected onto the loadings for the leading five principal components from the 1000G PCA and we computed the likelihood that each sample belonged to each subpopulation under a mixture of multivariate Gaussians models. Each sample was allocated to the population with the highest likelihood, unless the highest likelihood was similar to likelihood values for other populations, as might be expected for example under admixed ancestry or if the sample came from a population not included in 1000G. Such ambiguous samples were labeled as "other". PC-Relate was used to to identify related individuals in NIHR BR-RD. We used the first 20 PCs from PC-Air to adjust for relatedness and extracted the pairwise Identity-By-State distances and kinship values. The pairwise information was used by Primus to infer family networks and calculate the maximum set of unrelated samples.

Cohort definition and allele frequency calculation
Based on the relatedness analysis, we defined the following sample subsets: (a) the maximum number of unrelated non-PAH controls (UPAHC, n=6385), (b) all affected PAH cases (PAHAFF, n=1048), and (c) all unrelated PAH index cases (PAHIDX, n=1038). These subsets were used to annotate the variants in the multi-sample VCF file with calculated minor allele frequencies using the fill-tags extension of BCFtools 49 .

Rare variant filtering
Filtering of rare variants was performed as follows: 1) variants with a MAF less than 1 in 10,000 in UPAHC subjects, UK10K and ExAC were retained (adjusted for X chromosome variants to 1 in 8,000); 2) variants with a combined annotation dependent depletion deleteriousness (CADD) score of less than 15 were excluded. CADD scores were calculated using the CADD web service (http://cadd.gs.washington.edu) for variants lacking a score; 3) premature truncating variants (PTVs) or missense variants of the canonical transcript were retained; 4) missense variants predicted to be both tolerated and benign by SIFT and PolyPhen-2, respectively, were removed.
To identify likely causative mutations (as reported in Table 2), variants in previously reported and putative genes, identified in this study, were examined in more detail to exclude variants that did not segregate in families (where data available). Furthermore, variants shared between cases and non-PAH controls, as well as variants of uncertain significance that cooccurred with previously reported causative mutations or high impact PTVs were also excluded.
Burden analysis of premature termination and missense variants Filtered variants were grouped per gene and consequence type (predicted PTV / missense) and subjects with at least one variant were counted (no double counting) per group and tested for association with disease. We applied a one-tailed Fisher's exact test with post hoc Bonferroni correction to calculate the P value for genome-wide significance.

Rare variant analysis using SKAT-O
To further investigate the aggregated effect that rare variants contribute to PAH aetiology, we applied a Sequence Kernel Association test (SKAT-O). SKAT-O increases the power of discovery under different inheritance models by combining variance-component and burden tests. Variants were filtered based on MAF as specified above, and only PTV and missense variants were included. For the analysis we implemented SKAT-O in RvTests v1.9.9 50 with default parameters and weights being Beta (1,25), and applying a correction for read length, gender and the first five principal components of the ethnicity PCA. Variants were collapsed considering only the protein-coding region in the canonical transcript of the protein-coding genes in the genome assembly GRCh37.

Analysis of large deletions
Copy number variation was identified using Canvas 51 and Manta 52 . Deletions called by both Manta and Canvas with a reciprocal overlap of ≥ 20% were retained. Of these, deletions were excluded if both failed standard Illumina quality metrics or overlapped with known benign deletions in healthy cohorts 53 . Deletions with a reciprocal overlap of ≥ 50% between samples were merged and filtered for a frequency of less than 1 in 1,000 in WGS10K and overlapping exonic regions of protein coding genes (GRCh37 genome assembly). The number of subjects with deletions were added up by gene (no double counting of subjects) and tested for association with the disease. We applied a one-tailed (greater) Fisher's exact test with Bonferroni post hoc correction for multiple testing to determine the P values for genome-wide significance.

Confirmation of variants
Variant sequencing reads for SNVs, indels and deletions were visualised for validation on Integrative Genomes Viewer (IGV) 54 , and were confirmed by diagnostic capture-based highthroughput sequencing, if the IGV inspection was not satisfactory.

Structural analysis of novel variants
The domain structures and the functional groups of the novel PAH genes were plotted according to the entry in UniProtKB. Clustal Omega was used for sequence alignment. Structural data were obtained from RCSB Protein Data Bank and analysed according to published reports. Figures were generated using PyMOL Molecular Graphics System.

Production of pGDF2 Wild Type and Variant Proteins
The cloning of human wild type pro-GDF2 (pGDF2) in pCEP4 has been described previously 55 . Site-directed mutagenesis was performed according to the manufacturer's instructions (QuickChange Site Directed Mutagenesis Kit, Agilent Technologies). Mutations were confirmed by Sanger sequencing. HEK-EBNA cells were transfected with plasmids containing either wild-type or mutant pGDF2 for 14 hours. The transfecting supernatant was removed and replaced with CDCHO media (Invitrogen) for 5 days to express the proteins. The conditioned media containing GDF2 and the variants were harvested and snap-frozen on dry-ice before being stored at -80 o C. For each variant, conditioned media from three independent transfections were collected for further characterisation.

Cell culture and treatments
Distal human pulmonary artery smooth muscle cells (PASMCs) were cultured from explants of small pulmonary vessels (<2mm diameter) dissected from lung resection specimens. The lung parenchyma was separated from a pulmonary arteriole following the arteriolar tree to isolate 0.5-to 2-mm-diameter vessels, which were excised, cut into small fragments, plated in T25 flasks and left to adhere for 2 h. Once adhered, the explants were propagated in DMEM/20% FBS plus A/A until cells had grown out and were forming confluent monolayers. PASMCs were trypsinized, and subsequent passages were propagated in DMEM supplemented with 10% heat-inactivated FBS and A/A and maintained at 37 o C in 95% air-5% CO2. The smooth muscle phenotype was confirmed by positive immunofluorescent staining using an antibody to smooth muscle specific alpha-actin (Clone IA4 Sigma-Aldrich; 1:100 dilution). A section of the pulmonary arteriole was collected, fixed in formalin and embedded in paraffin, and sections analysed to ensure that the vessel was of pulmonary origin. Papworth Hospital ethical review committee approved the use of the human tissues (Ethics Ref 08/H0304/56+5) and informed consent was obtained from all subjects.
Human blood outgrowth endothelial cells (BOECs) were generated from 40-80 ml peripheral blood isolated with informed consent from healthy controls. The study was approved by the Cambridgeshire 3 Research Ethics Committee, reference number 11/EE/0297. BOECs were cultured in EGM-2MV with 10% standard grade FBS (Life Technologies, Carlsbad, CA) and were used for experiments between passages 4 and 8 56 . The endothelial nature of BOECs was assessed by flow cytometry for endothelial surface marker expression as described previously 26 . All cell lines were routinely tested for mycoplasma contamination. For experiments involving BOEC generation, Human pulmonary artery endothelial cells (PAECs) were purchased from Lonza (Cat. No. CC-2530; Basel, Switzerland) and maintained in EGM-2 with 2% fetal bovine serum (FBS; Lonza). PAECs were used between passages 4 and 8. For experimental studies, cells were treated with EBM-2 containing Antibiotic-Antimycotic (A/A; 100U ml -1 penicillin, 100 mg ml -1 streptomycin and 0.25 mg ml -1 amphotericin B, Invitrogen, Renfrewshire, UK) and FBS concentrations as stated. Cell lines were routinely tested for mycoplasma contamination and only used if negative.
RNA preparation and quantitative reverse transcription-PCR.
Total RNA was extracted using the RNeasy Mini Kit with DNAse digestion (Qiagen, West Sussex, UK). cDNA was prepared from 1 µg of RNA using the High Capacity Reverse Transcriptase kit (Applied Biosystems, Foster City, CA), according to the manufacturer's instructions. All quantitative PCR reactions were prepared in MicroAmp optical 96-well reaction plates (Applied Biosystems) using 50 ng µl -1 cDNA with SYBR Green Jumpstart Taq Readymix (Sigma-Aldrich), ROX reference dye (Invitrogen) and custom sense and antisense primers (all 200 nmol l -1 ). Primers for human ACTB (encoding β-actin), AQP1, ATP13A3, B2M, HPRT and SOX17 were designed using PrimerBLAST (https://www.ncbi.nlm.nih.gov/tools/primer-blast/) (Table S10). Reactions were amplified on a Quantstudio 6 Real-Time PCR system (Applied Biosystems). The relative abundance of each target gene in different cell lines was compared using the equation 2 -(CtGOI-Ct3HK) , where Ct3HK corresponded to the arithmetic mean of the Cts for ACTB, B2M and HPRT for each sample. For expression analysis of siRNA knockdown, the 2 -(ΔΔCt) method was used and fold expression determined relative to the DH1 control.

siRNA transfection
Prior to transfection, cells were preincubated in Opti-MEM-I reduced serum media (Invitrogen) for 2h before transfection with 10nM siRNA that had been lipoplexed for 20 min at RT with DharmaFECT1 (GE Dharmacon, Lafayette, CO). Cells were then incubated with the siRNA/DharmaFECT1 complexes for 4h at 37 o C before replaced by full growth media. Cells were kept in growth media for 24h before further treatment. Knockdown efficiency was confirmed by mRNA expression or immunoblotting. For proliferation assays, parallel RNA samples were collected both on day0 and day6, confirming that ATP13A3 expression was reduced by >90% on Day 0 and still reduced by >70% at Day 6. For all other assays, parallel RNA samples were collected on the day of the experiment to confirm knockdown, which was >90%. The siRNAs used were oligos targeting ATP13A3 (SASI_Hs02_00356805) from Sigma-Aldrich and ON-TARGETplus non-targeting Pool (siCP; GE Dharmacon).

Flow cytometric apoptosis assay
BOECs were plated 150,000/well into 6-well plates and transfected with siATP13A3 or siCP lipoplexed with DharmaFECT1. Cells were then serum-starved in EBM-2 (Lonza) containing 0.1% FBS and A/A for 8 hours before treating with EBM-2 and A/A containing either 0.1%FBS or 5%FBS for another 24 hours. Cells were then trypsinized and after washing with PBS, stained using the FITC Annexin V Apoptosis Dectection Kit I (BD Biosciences). For each condition, dual-staining of 5µl FITC conjugated Annexin V and 5µl propidium iodide (PI) were added and incubated at room temperature for 15 minutes. For the single staining controls for compensation, either 5µl FITC Annexin V or 5µl PI was added into non-    16 20 X 16 20 X     processing. The pre-pro-protein is processed into the mature growth factor domain (GFD) bound to the prodomain upon secretion 20 . (b) Plot of GDF2 mutations found only in PAH cases superimposed on the structure of prodomain bound GDF2. (pdb code 4YCG) 19 . The GDF2 growth factor domain is shown in green and the prodomain in cyan. (c) Magnified view of the Arg110 and Glu143 mutations. The wild type amino acids make double salt bridges to stabilise the prodomain conformation at the interface between the growth factor domain and prodomain. The E143K and R110W mutations both disrupt these interactions, destabilising the interaction between the growth factor domain and prodomain. (d) GDF2 levels secreted into supernatants of HEK293T cells transfected with likely pathogenic variants found in PAH cases, compared with wild type GDF2 and cells transfected with an empty vector. *** P < 0.001 by ANOVA.   with bovine AQP1, which has the high resolution (2.2Å) published structure. Mutations identified in PAH cases are highly conserved: Val176 and Arg195 [human AQP1 numbering is used] are highlighted in yellow. (b) Crystal structure of bovine AQP1 (pdb code 1j4n) 23 . Left: side view; right: top view from the extracellular direction. AQP1 is shown as a semitransparent cartoon and five water molecules in the water channel are shown as red circles. Key residues lining the water channels are shown as stick structures. (c) Magnified view of the water channel, with H-bonds connected to water molecules in the channel highlighted. Two NPA motifs, essential for the water transporting function of AQP1, are shown in magenta. Conserved His180 that constricts the water channel is shown in yellow. Mutations found in PAH cases, Arg195Trp and Val176Glu, are labelled and shown as orange stick structures. Arg195 and His180 are highly conserved in the known water channels and are strong indicators of water channel specificity. Arg195Trp and Val176Glu mutations are predicted to disrupt the conformation of this conserved water channel.