The Unravelling of the Genetic Architecture of Plasminogen Deficiency and its Relation to Thrombotic Disease

Although plasminogen is a key protein in fibrinolysis and several mutations in the plasminogen gene (PLG) have been identified that result in plasminogen deficiency, there are conflicting reports to associate it with the risk of thrombosis. Our aim was to unravel the genetic architecture of PLG in families with plasminogen deficiency and its relationship with spontaneous thrombotic events in these families. A total of 13 individuals from 4 families were recruited. Their genetic risk profile of thromboembolism was characterized using the Thrombo inCode kit. Only one family presented genetic risk of thromboembolism (homozygous carrier of F12 rs1801020 and F13A1 rs5985). The whole PLG was tested using Next Generation Sequencing (NGS) and 5 putative pathogenic mutations were found (after in silico predictions) and associated with plasminogen deficiency. Although we can not find genetic risk factors of thrombosis in 3 of 4 families, even the mutations associated with plasminogen deficiency do not cosegregated with thrombosis, we can not exclude plasminogen deficiency as a susceptibility risk factor for thrombosis, since thrombosis is a multifactorial and complex disease where unknown genetic risk factors, in addition to plasminogen deficiency, within these families may explain the thrombotic tendency.

Plasminogen deficiency results in a reduction in the degradation of fibrin and thus affects wound healing. For example, ligneous conjunctivitis, congenital occlusive hydrocephalus and juvenile colloid milium have been reported 1,3 . Among these, ligneous conjunctivitis is the most common and is characterized by chronic tearing, erythema of the conjunctiva and white, yellow-white, or red thick masses with a wood-like consistency. This chronic conjunctivitis is associated with homozygous or compound heterozygous type I patients. Of note, heterozygous type I and type II patients are asymptomatic 3 . Some studies 6,11 suggested that there was an association between plasminogen deficiency and thrombosis due to impaired fibrinolysis. However, these studies involved few patients of sympthomatic thrombophilia. In addition, most family studies showed that only the probands had thrombosis. It is noteworthy that some risk factors of thrombosis such as activated protein C (APC) resistance were unknown when this association was reported 12 . More recent reports supports the hypothesis that this disorder by itself is not a risk factor of venous or arterial thrombosis 3,12,13 . Also, it is suggested that the diminished plasminogen capability of fibrinolysis may be compensated by the action of alternative enzymes in the blood 3 .
In our study, we identified 4 Spanish patients that had suffered spontaneous thrombotic events. We identified a plasminogen deficiency in these patients and in several of their relatives. Our aims were to characterize the genetic risk factors of thromboembolism in these families and to detect the genetic mutations that were involved in the plasminogen deficiency. We used a new method to diagnose the deficiency based on Next Generation Sequencing (NGS) of PLG. We evaluated also the relation of these causal genetic mutations to the thrombotic outcomes to elucidate the role of plasminogen deficiency in thrombotic disease.

Subjects.
A total of 13 individuals from 4 Spanish families (A, B, C and D) were recruited at the Hospital General Universitario de Alicante. These families were selected through probands with a positive history of spontaneous thrombosis and plasminogen deficiency. The latter was defined as a functional activity below 72% (normal range, 72% to 127%). The characterization of these families is presented in Table 1. Clinical manifestations related to plasminogen deficiency were not identified and no classification of type I or type II was performed. All of the probands and one relative had experienced a spontaneous deep venous thrombosis of the legs or an arterial thrombotic event. The number of individuals per family varied from 1 to 5 and they ranged in age from 13 to 74 years. A total of 9 individuals were selected from the 4 families for genetic analysis, which included the probands and individuals with the lowest functional plasminogen deficiency values. At least 2 relatives of every family were selected for inclusion, except for one family with no relatives.
The study is part of the clinical routine of the Hospital General Universitario de Alicante for thrombophilia patients conducted in accordance with the Good Practice Guidelines that included an informed consent from all individuals prior to inclusion. The studies were conducted in accordance with the Declaration of Helsinki.
Blood Collection and Phenotype Determinations. Blood samples were obtained from the antecubital vein and anticoagulated with 3.8% sodium citrate in the proportion 1:10. The blood samples were centrifuged for 10 minutes at 3,500 g within 15 minutes after collection. The poor-platelet plasma (PPP) samples were immediately frozen and stored at − 80 °C until tested. DNA was extracted from whole blood collected in EDTA using a standard salting out procedure 14 . The functional plasminogen activity was performed in PPP using a chromogenic substrate assay, activated by tissue plasminogen activator (Instrumentation Laboratory, Werfen Group, MA, USA).  Table 2. The PCR primers were designed to specifically amplify the PLG and to avoid co-amplifications in view of the high homology between PLG, LPA family genes, pseudogenes and plasminogen-like genes 10 . The overlapped PCR amplicons covered all of the exons, introns, 5′ -UTR, 3′ -UTR and approximately 1,500 bp of the 5′ -promoter region. The PCR amplicons designed were tested for target specificity by Sanger sequencing as described below. LR-PCR amplicons were generated using the SequalPrep Long PCR Kit with dNTPs (Invitrogen, Thermo Fisher Scientific Inc., MA, USA). The LR-PCR mix solution contained ~50 ng of DNA, SequalPrep 1X reaction buffer, 0.4 μ l of dimethylsulfoxide (DMSO), SequalPrep 1X enhancer B, 0.75 μ M of forward and reverse primers and 1.8 units of SequalPrep Long Polymerase in a total volume of 20 μ l. After initial denaturation at 94 °C for 2 minutes (min), 10 cycles of 94 °C for 10 seconds (sec), 64 °C for 30 sec, and 68 °C for 18 min were performed, followed by 22 cycles of 94 °C for 10 sec, 64 °C for 30 sec, and 68 °C for 18 min (+ 20 sec/cycle). In addition, an elongation step at 72 °C for 5 min was performed.
Every PCR amplicon was run on 0.7% agarose gel electrophoresis and visualized using SYBR safe (Invitrogen, Thermo Fisher Scientific Inc.). PCR amplicons were quantified by the Qubit technology (Invitrogen, Thermo Fisher Scientific Inc.) and a normalized pool of the 7 PCR amplicons was prepared for each individual by mixing equimolar amounts. Finally, the PCR pools were adjusted at 0.2 ng/μ l for preparing the libraries.
Library Preparation and NGS. The sequencing libraries were prepared from pooled PCR amplicons using the Nextera XT DNA Sample Preparation kit (Illumina, San Diego, CA, USA) with double indexing, following the standard manufacturer's protocol. We obtained 9 paired-end libraries that were pooled and run simultaneously on an Illumina Miseq sequencing system (Illumina) by the Miseq sequencing reagent kit v2 of 300 cycles (2 × 150 bp paired-end) (Illumina).
Bioinformatic Analysis. Indexed sequences were de-multiplexed and analyzed individually. The NGS pipeline output, paired sequence files (fastq files format), was used as input for the analysis with the CLC Genomic Workbench (v.6.5) software (CLC Bio -Qiagen, Aarhus, Denmark). The raw data were trimmed with length (minimum 25 bp; maximum 500 bp), ambiguous nucleotide (maximum 2) and quality score (0.05) filters. CLC Genomic Workbench software permitted the alignment of the trimmed reads against the human genome reference (hg19) and in silico analysis. Read mapping was performed with specific parameter setting (mismatch count, 2; indel count, 3; length fraction, 0.7; similarity fraction, 0.9). Indels and structural variants tool was used to identify insertions and deletions, inversions, translocations and tandem duplications, applying standard settings. Moreover, adjusted parameters for quality-based variant detection were as follows: minimum coverage, 30x; minimum variant frequency, 25%. Quality-based variant detection results in variant call format (VCF) file were used as input for the Illumina VariantStudio Data Analysis (v.2.1) Software (Illumina) to annotate variants.
Screening Putative Pathogenic Mutations. For structural variants, we used the following filter parameters to detect pathogenic variations: a) minimum variant ratio, 0.25, b) minimum mapping scores fraction, 0.6, c) consider intronic structural variants located < 30 bp from exon flanking boundaries, d) allele frequency from our own NGS variants database of 110 individuals, ≤ 5%, and e) co-segregation in the family.
The following criteria were applied to identify putative pathogenic mutations within single nucleotide variants (SNV) and small insertions and deletions: a) whether the variant was rare: allele frequency ≤ 1% from 1000 Genomes (April 2012 v.3) 16    In Silico Analyses. The in silico prediction that evaluate functional effects of putative pathogenic mutations was performed using the Alamut Visual (v.2.6.1) software (Interactive Biosoftware, Rouen, France). Changes in splicing sites were predicted using NNSplice and Human Splicing Finder tools. For splicing variants interpretation, a splicing site change was considered as potentially deleterious when a variation between the native and the mutation score of more than 10% was observed in both algorithms 18 . Missense prediction tools included SIFT, PolyPhen-2, Align GVGD and Mutation Taster. Evolutionary conservation scores were obtained using phyloP and Grantham distances. Interpretation of predictive structural effects of new missense mutations was investigated by using the Project HOPE software 19 .

Results
Genetic profile of thromboembolism. We characterized the alleles of 12 genetic variants located within F2, F5, F12, F13A1, ABO, SERPINA10 and SERPINC1 loci, included in the TiC kit. Based on these results, all of the individuals were reported as not at genetic risk of thrombotic disease except for the proband of Family D (Table 3). This individual was homozygous for the alleles c.-4T in the F12 (rs1801020), also known as 46 T risk allele 15,20 , and c.103 G> G of the F13A1 (rs5985), also known as Val34 risk allele 15 . NGS results. We amplified 55,184 bp encompassing the PLG locus using LR-PCR from 9 individuals. These LR-PCR amplicons were analysed by NGS. Briefly, the percentage of the mapped positions with a depth of coverage above 30x was 95% and the median coverage per individual was 212x. In total, 237 potential structural variants and 301 unique SNV and small insertions and deletions were called. Among the latter, the percentage of indels and exonic variants was 12.6% (38) and 3.7% (11), respectively, and the 55.5% (167) had an allele frequency ≤ 1% from all population of 1000 Genomes (April 2012 v.3) and from four populations of 1000 Genomes (American, East Asian, African and European). In addition, the 47.8% (144) of the variants had not been reported in dbSNP (v. 137) and 143 out of 144 were within the introns.
Putative pathogenic mutations. We identified 5 putative pathogenic mutations, including 3 missense variations and 2 potential splicing site mutations (1 synonymous and 1 intronic variants) reported in Table 3  was available. Also, we performed in silico predictions (Table 4). Specifically, any structural variant passed through the filtering criteria for putative pathogenic structural variant detection. In Family A, which was composed of 3 members, the proband was compound heterozygous in trans for p.Lys-38Glu (in exon 2) and p.Gly712Arg (in exon 18) missense variations in PLG. This individual had a functional plasminogen activity level of 24%. The variant p.Lys38Glu was also heterozygous in the son with a level of functional plasminogen activity of 67% while another son who was heterozygous for the p.Gly712Arg variation had a functional plasminogen activity of 58%. Of note, both missense variations have been described previously 1,3,21 in association with plasminogen deficiency.
The p.Lys38Glu variation was identified also in Family B. This missense mutation was detected as heterozygous in 4 of the 5 members. These 4 individuals included the proband with a functional plasminogen activity level of 47% and 3 family members with functional plasminogen activity level of 58%, 50% and 67%. In contrast, the non-carrier family member showed a normal level of functional plasminogen activity (82%).
In Family C, the new putative pathogenic variation p.Arg261Cys (in exon 7) was heterozygous in 3 of the family members. The functional plasminogen activity was low in these individuals, who included the proband (68%), his sibling (64%) and his mother (64%), in comparison to the level detected in the maternal aunt of the proband, who was a non-carrier family member (101%). The effects of this missense mutation predicted by SIFT, Polyphen-2, Align GDGV and Mutation Taster were deleterious, probably damaging, most likely interfering with protein function and disease causing, respectively. In addition, it was reported as a highly conserved nucleotide and there were large physicochemical differences between Arg and Cys. Project HOPE software revealed differences between the native and the mutant residues in size and charge. The cysteine residue was smaller than the native arginine residue and the positive charge of the arginine was lost as a result of this mutation. Furthermore, the mutated cysteine was located very close to a residue that makes a cysteine bond, and despite this bond which itself is not mutated, it could be affected by the mutation located in its vicinity. The missense variation p.Arg-261Cys was characterized as part of a Kringle domain with hydrolase activity, which is essential for the activity of the protein and involved in protein-protein interactions.
The last proband (Family D) had a functional plasminogen activity of 55%. The mutational analysis showed that this individual was compound heterozygous for two potential splice site mutations. Specifically, the p.Lys4Lys synonymous variant in exon 1, which was located in the signal peptide, and the c.1878-6 T> C variant in intron 15. To investigate in silico functional consequences of these new putative pathogenic mutations predictive changes in splicing sites were analyzed. By doing so, no alterations at the donor splice site of exon 1 or at the natural acceptor splice site of exon 16 were detected for the synonymous and the intronic variants, respectively.

Discussion
We present the molecular characterization of 13 individuals from 4 plasminogen deficiency families, in which the probands had suffered a thrombotic event. Since there is doubt as to whether plasminogen deficiency is a risk factor of thrombotic disease by itself or in combination with other abnormalities 1 , we genotyped 12 genetic risk factors of thrombosis coded by F2, F5, F12, F13A1, ABO, SERPINA10 and SERPINC1 loci to evaluate the putative role of plasminogen deficiency phenotype in thrombotic disease. We identified a genetic risk profile of thromboembolism in Family D. The proband was homozygous for the risk alleles in F12 (rs1801020) and F13 (rs5985). Interestingly, this individual had positive antiphospholipid antibodies also that might have acted as a risk factor in her ischemic stroke. In contrast, Families A, B and C did not show genetic risk profiles of thromboembolism despite suffering thrombotic episodes. Thus, plasminogen deficiency could not be ruled out as an additional risk factor. It is noteworthy that thromboembolism is a common disease with more than 60% of the variation due to genetic risk factors 22,23 . However, genetic scores explain only 15% of the variance 15 , so there is still a "missing heritability" that might have hampered the discrimination of plasminogen deficiency as an additional risk factor of thrombotic disease. Also, whether individuals of these families other than the probands will suffer a thrombotic event in the future is not known. We sequenced the whole PLG locus with good coverage of high quality reads to identify the genetic variants that might be involved in the functional plasminogen deficiency of these families using the NGS methodology. Recent advances in NGS have provided high-throughput, economical, sensitive and faster sequencing methodologies 24,25 . Because of these advantages, and the fact that many samples can be analyzed simultaneously, NGS is ideal for targeted diagnostic sequencing of monogenic diseases where genetic heterogeneity is expected. New pathogenic mutations have been discovered using NGS 26,27 but to our knowledge, no publications have explored the genetic basis of plasminogen deficiency using NGS. We performed whole gene sequencing to exhaustively examine not only exons but also non-coding regions (promoter, introns, UTR) that might be involved in regulatory functions [28][29][30] . In addition, LR-PCR provided a fast and cost-effective technique for avoiding the amplification of plasminogen pseudogenes 31,32 . Also, LR-PCR in combination with paired-end sequencing allowed us to use algorithms to accurately detect structural variants and avoid the use of additional analyses as MLPA (multiplex ligation-dependent probe amplification) or MAPH (multiplex amplifiable probe hybridization).
We have identified 5 putative pathogenic mutations, which are in concordance with the levels of plasminogen functional activity by intrafamilial analyses. In particular, the mutation p.Lys38Glu (K19E, old nomenclature) has been described 1,3,21 widely as the most common molecular genetic defect in association with type I. This mutation has been identified in homozygous, compound heterozygous and heterozygous state 4 but it is not known why this mutation and others in the PLG leads to diverse clinical conditions 8 . In addition, the missense variation p.Gly712Arg (G693R, old nomenclature) has been described in heterozygous state related to type II plasminogen deficiency 21 . Interestingly, we have identified both the p.Lys38Glu and p.Gly712Arg variants as compound heterozygous in one individual. In contrast, for the first time to our knowledge the p.Arg261Cys variation is described in association with plasminogen deficiency. This missense variation was classified as deleterious using in silico predictions. Several structural effects in the plasminogen protein have been suggested by Project HOPE. Specifically, changes in size and charge between the native and the mutated residue were predicted, which could cause loss of protein-protein interactions. In addition, Project HOPE software predicted that p.Arg261Cys mutation could disturb the interaction between domains, which might affect the protein's function also. This variant has been listed previously as variant 6:161137789 C / T in the Exome Aggregation Consortium (ExAC) Browser (Cambridge, MA, http://exac.broadinstitute.org) with an allele frequency of 5.771 × 10 −5 . It is noteworthy that another variation in the same amino acid (c.782 G> A, p.Arg261His, rs4252187) has been annotated in NCBI dbSNP (v.146, http://www.ncbi.nlm.nih.gov/SNP/). Also, we identified the synonymous p.Lys4Lys and the intronic c.1878-6 T> C variation in one individual as new putative pathogenic mutations in association with plasminogen deficiency. Both genetic variations were not predicted by in silico analyses to change the splice site natural junction. Despite this, synonymous and intronic variations could cause alterations in protein expression and function 33,34 . Thus, further analyses are needed to determine the functional characterization of these mutations.
It is important to note that we identified the whole spectrum of genetic variability of the PLG structural gene in these families. However, we did not observe a clear association between the genotype, the number or the type of putative pathogenic mutations in PLG with the thrombotic phenotype of these individuals. Moreover, we can not find genetic risk factors of thrombosis in 3 of 4 families. These observations provide us with a scenario where we can not confirm neither exclude plasminogen deficiency as a susceptibility risk factor for thrombosis.
Our view is that thrombosis is a multifactorial and complex disease where several genetic risk factors with environmental situations increase the susceptibility to develop a thromboembolic event. Therefore, in these families unknown genetic risk factors, in addition to plasminogen deficiency, may explain the thrombotic tendency in concrete clinical situations. Thus, we believe that it may be useful the evaluation of plasminogen deficiency in individuals that had suffered a spontaneous thrombotic event and had a negative routing thrombophilia test.
It is important to emphasize that, from a clinical point of view, we detected putative pathogenic mutations that explain the plasminogen deficiency. In these sense, our NGS approach clearly contributed to the genetic knowledge of plasminogen deficiency. We believe that NGS has the potential to identify and characterize the molecular basis of a wide variety of disorders in addition to plasminogen deficiency.