Identification of novel FBN1 variations implicated in congenital scoliosis

Congenital scoliosis (CS) is a form of scoliosis caused by congenital vertebral malformations. Genetic predisposition has been demonstrated in CS. We previously reported that TBX6 loss-of-function causes CS in a compound heterozygous model; however, this model can explain only 10% of CS. Many monogenic and polygenic CS genes remain to be elucidated. In this study, we analyzed exome sequencing (ES) data of 615 Chinese CS from the Deciphering Disorders Involving Scoliosis and COmorbidities (DISCO) project. Cosegregation studies for 103 familial CS identified a novel heterozygous nonsense variant, c.2649G>A (p.Trp883Ter) in FBN1. The association between FBN1 and CS was then analyzed by extracting FBN1 variants from ES data of 574 sporadic CS and 828 controls; 30 novel variants were identified and prioritized for further analyses. A mutational burden test showed that the deleterious FBN1 variants were significantly enriched in CS subjects (OR = 3.9, P = 0.03 by Fisher’s exact test). One missense variant, c.2613A>C (p.Leu871Phe) was recurrent in two unrelated CS subjects, and in vitro functional experiments for the variant suggest that FBN1 may contribute to CS by upregulating the transforming growth factor beta (TGF-β) signaling. Our study expanded the phenotypic spectrum of FBN1, and provided nove insights into the genetic etiology of CS.


Introduction
Congenital scoliosis (CS) is a form of scoliosis caused by congenital vertebral malformations potentially resulting from formation failure, segmentation defects, or a combination of both [1]. The prevalence of CS is~0.5-1 per 1000 live births [2]. As a major cause to infantile and adolescent disability, CS perturbs patient lives and daily activities both physically and psychologically [3].
Genetic predisposition has been demonstrated in CS [4,5]. CS is a consequence of single-gene mutations. We previously reported that a heterozygous 16p11.2 deletion or rare TBX6 loss-of-function (LoF) variants together with a common hypomorphic risk haplotype composed by three SNPs in trans cause CS [6]. We subsequently recapitulated this compound heterozygosity model in a gene-edited mouse [7], and defined the unique and clinically actionable phenotype of a monogenic form of CS, TBX6-associated CS (TACS) [8]. However, this model can explain only about 10% of CS [8,9]. Many other contributor genes remain to be elucidated.
Here, as a part of the Deciphering Disorders Involving Scoliosis and COmorbidities (DISCO) project, we conducted exome sequencing (ES) for a CS cohort. Trio-based analyses on familial cases identified a novel nonsense variant in FBN1, the gene implicated in Marfan syndrome (MFS; MIM#154700) and many monogenic diseases with scoliosis and spinal dysplasia. We then examined the effect of rare FBN1 variants on sporadic CS and observed that deleterious missense variants were significantly enriched in CS. Functional analyses of a recurrent FBN1 missense variant revealed the potential association between upregulation of transforming growth factor beta (TGF-β) signaling and CS. The subjects carrying highly deleterious FBN1 variants had vertebral malformations, malformations of the ribs, and intraspinal defects.

Participant recruitment
Initially, 615 Chinese CS subjects with complete clinical data and ES data were recruited at Peking Union Medical College Hospital (PUMCH) in China from 2010 to 2018, as a pivotal part of DISCO study (http://www.discostudy.org/). There were 103 familial cases with available samples for first-degree relatives. TBX6 mutations were examined and 41 cases molecularly diagnosed as TACS [6,8] were removed. Consequently, 574 CS cases were retained for further study. For the control subjects, 828 unrelated Chinese iindividuals without apparent spinal deformity were ascertained. The ethical committees or institutional review boards at PUMCH that contributed to the patient samples approved the study. Written informed consent was obtained from each participant or their guardians (for those younger than 16 years old).

Exome Sequencing and Sanger sequencing
Genomic DNA from peripheral blood samples was extracted using the DNeasy Blood & Tissue kit (Qiagen, Hilden, Germany). The ES data were processed and analyzed using the PUMCH Pipeline as previously described [17]. The following public available databases were used for mutation annotation: the 1000 Genomes Project (1KG; http://www.internationalgenome.org), the Exome Sequencing Project (ESP; http://evs.gs.washington.edu/ EVS/), Exome Aggregation Consortium (ExAC; http://exa c.broadinstitute.org/), and Genome Aggregation Database (gnomAD; http://gnomad-old.broadinstitute.org), Single Nucleotide Polymorphism database (dbSNP; https://www. ncbi.nlm.nih.gov/snp). In silico prediction tools, including Sorting Intolerant From Tolerant (SIFT) [18], Polymorphism Phenotyping v2 (Polyphen-2) [19], Mutation Taster [20], and Combined Annotation Dependent Depletion (CADD) [21] were utilized to predict deleterious properties of variants. We also annotated the detected variants using customized database based on Human Gene Mutation Database (HGMD; www.hgmd.cf.ac.uk/) and Online Mendelian Inheritance in Man (https://omim.org/). Empirically, we utilized the recommended filters of the Genome Analysis Toolkit (v2.2-3) prior to performing association analyses in order to guarantee that the risk of false negatives is minimized to select quality control passed variants [22]. Sanger sequencing by using an Applied Biosystem 3730xl DNA Analyzer (Life Technologies, CA, USA) was conducted to confirm the candidate variants identified by ES, and segregation analyses were performed in family members. The RefSeq accession numbers of the transcript and corresponding protein isoform of FBN1 we used for mutation nomenclature were NM_000138.4 and NP_000129.3, respectively.

Mutational burden analyses
Mutational burden analyses of FBN1 were implemented between 574 CS cases and 828 controls. To alleviate the biased factors attributable to differential sequencing coverage, we conducted harmonization analyses between case and control exomes. An individual RefSeq coding sequence site was excluded from the analysis if the absolute difference in percentages of cases compared with controls with adequate coverage of the site differed by >10%. This site-based pruning resulted in exclusion of 4.8% of the Refseq coding sequence sites. We also introduced a likely gene-disrupting (LGD) model [23] to prioritize the candidate variants. The LGD model is defined by clustering LoF variants (nonsense, splice-site, and insertion/deletion). A damaging missense (D-mis) model was also applied for variant prioritization. Dmis is defined by selection of D-mis variants with a predicted CADD score ≥ 20. In viewing of the dominant traits that FBN1 may have, inclusion criteria were strictly set to select the presumably LGD and D-mis variants to identify risk-conferring variants to CS. Variants that are not present at this time in 1KG, ESP, ExAC, dbSNP, the Universal Mutation Database for FBN1 (UMD-FBN1; http://www. umd.be/FBN1/) [24] were defined as "novel." Only novel variants were subjected to the burden analysis. We applied a collapsing method [25] to detect the association of mutational burden. The CADD score of 20, which corresponds to the top 1% of damage when evaluating all known allelic variants [26], was set as the cutoff value for creating a stratified variants subgroup for collapsing.

Construction of expression plasmids
We constructed a plasmid expressing full-length FBN1 (GenBank: NM_000138.4) cDNA with enhanced green fluorescent protein (EGFP) fusion, pEGFP-FBN1. A fulllength FBN1 cDNA having suitable restriction sites was PCR-amplified using KOD-Plus-Neo (Toyobo, Japan). The PCR amplicons were cloned into the NheI and SacII sites of the pEGFP-N1 expression vector (Clontech, Takara Bio, Japan). The construct for the Leu871Phe variant was generated by QuikChange Lightning Site-Directed Mutagenesis Kit (Agilent Technologies, CA, USA). All plasmid constructs were subjected to Sanger sequencing for verification.

Cell culture and transfection
HEK293T cells were cultured in Dulbecco's Modified Eagle Medium (DMEM) (Gibco, Waltham, MA, USA) supplemented with 10% FCS (Biological Industries, CT, USA) and 1% penicillin/streptomycin (Gibco) at 37°C with 5% CO 2 . Cells were seeded in a six-well plates and transfected with Lipofectamine™ 3000 Transfection Reagent (Invitrogen, CA, USA) according to the manufacturers' instructions. The medium was replaced with fresh DMEM culture medium 6 h after transfection. The cells were further incubated for 48 h.

Statistical analyses
A one-sided Fisher's exact test or a one-sided Chi-square test was used for all mutational burden analyses by using SPSS Statistics 23.0 (SPSS Inc., Al Monk, NY, USA), and a Student's t test was used to compare the differences of qPCR and WB results. All cell experiments were independently repeated three times with different cell lysates for each single assay, and data were presented as mean ± S.E. M. A P value < 0.05 was considered statistically significant.

Identification of a novel FBN1 null variant transmitted in a CS family
We applied ES to identify responsible variants with autosomal dominant inheritance in all 103 familial CS. We identified a maternally inherited LGD variant associated with the vertebral phenotypes in a family. The novel heterozygous LGD variant was c.2649G>A (p.Trp883Ter) in FBN1, which co-segregated with CS in a dominant family (Fig. 1a, b). The nonsense variant was considered to exert a significant LoF effect by triggering the nonsense-mediated mRNA decay mechanism and subsequently loss of FBN1 protein. The proband III-1 displayed CS with structural segmentation defect of the vertebrae (T10-L1) and the formation of vertebral bar on the left side (T10-L1) (Fig. 1c), whereas the mother II-2 exhibited severe congenital kyphosis with segmentation defects (Fig. 1d). Although the mitral valve prolapse with mild mitral insufficiency was found in the proband, no other apparent cardiovascular or ocular phenotypes featuring MFS were identified in the proband and his mother. The aortic root measurements of the proband still remained normal (Z score = 1.4) by echocardiographic measurements, and the diagnosis of MFS could not be implemented according to the Ghent Criteria [28]. As for now, no obvious "marfanoid" appearances are observed in the family members. i.e., III-1 and II-2 (Supplementary Table 1).
In addition, we did not observe any LoF variants other than the FBN1 p.Trp883Ter in any of the family members.

Novel deleterious coding variants in FBN1 are enriched in CS patients
Concerning the potential association between FBN1 and CS, we further compared the difference of the frequency of FBN1 variants between sporadic CS cases and controls. We subsequently identified a total of 30 novel variants (28 missense, 1 frameshift, and 1 splicing variant) in 16 out of the 574 CS cases and 14 out of 828 in-house controls.
When all 30 variants were evaluated by the collapsing method, no significant association with CS was observed (OR = 1.6, P = 0.15 by one-sided Chi-square test). Next, the variants were stratified based on their deleteriousness determined by the CADD score. In the group of CADD score < 20, a total of 19 variants were subjected to the collapsed analysis without mutation types stratification and no statistical difference was identified. Further stratification within this group based on mutation types did not show any statistical difference in any mutation types (Table 1). In contrast, in the group of CADD score ≥ 20, a total of 11 missense variants were identified; eight in CS and three in controls, respectively. The eight missense variants (including one recurrent variant) were significantly enriched in CS (OR = 3.9, P = 0.03 by onesided Fisher's exact test) ( Table 1). Sanger sequencing results of the eight cases are shown in Supplementary  Fig. 1. Intriguingly, we have also checked the FBN1 mutations in our study but did not identify any overlapping variants with the idiopathic scoliosis cohort reported by Buchan et al. [29].  Genetic and phenotypic analyses of the CS individuals harboring candidate variants The genetic information and in silico deleteriousness predictions of the eight variants are presented in Table 2. One missense variant: p.Leu871Phe (CADD score = 23.9) was recurrent in two unrelated CS subjects. Leu871 was located within the hybrid 2 motif region of the protein (Fig. 2), which is an evolutionary 'hybrid' between TGF-β binding protein-like (TB) and calcium-binding epidermal growth factor-like (cbEGF-like) domains [30]. Previous studies indicated that mutations situated within this domain disrupted the ratio of an α-helix to β-sheet leading to a more compact conformation [31]. In addition, conservation analysis of amino acid residue of Leu871 in FBN1 indicated that it is highly conserved throughout evolution and across many selected species, suggesting it is required for the normal function of the protein (Fig. S2). The remaining six missense variants predicted with highly deleterious properties were located in epidermal growth factor-like (EGFlike) domains or cbEGF-like domains (Fig. 2), playing a seminal role in diseases [32]. The eight subjects harbored the FBN1 variants displayed different types of vertebral malformations (Table 3). Malformations of the ribs were also found in three subjects (Table 3). Notably, intraspinal defects (diastematomyelia, syringomyelia, and tethered cord) were identified in five subjects (Table 3). Although mild mitral insufficiency and pectus carinatum deformity presented in XH902, aortic root measurements still remained normal (Z score = 1.6), and ocular phenotypes supporting MFS were not present. This patient would be categorized as "potential MFS" until the aorta reaches threshold (Z score = 3) [28]. The remaining seven subjects had no cardiovascular or ocular manifestations (Table 3), and hence they were not diagnosed as MFS. We have also re-evaluated the other marfanoid musculoskeletal phenotypes of these eight cases, and observed that no apparent manifestations except Patient XH902 with pectus carinatum deformity (Supplementary Table 2).

Functional study of the recurrent variant, Leu871Phe
We overexpressed either human FBN1 wild-type or p. Leu871Phe in HEK293T cells. Total RNA was reversely transcribed to cDNA 48 h after transfection. The mRNA expression level of the p.Leu871Phe FBN1 showed no statistical difference to that of the WT plasmid (P = 0.38 by Student's t test) (Fig. 3a).
Previous studies have demonstrated that deleterious FBN1 mutations upregulate endogenous TGF-β signaling, which could be measured via phosphorylation of Smad2 (p-Smad2), a downstream target [33]. Transfected cells with Table 2 Candidate

Discussion
In the present study, we conducted family-based ES analysis and allele-based mutational burden analysis, and provided evidence for the genetic association between FBN1 and CS. We then performed in vitro functional assay for a recurrent FBN1 variant (p.Leu871Phe) and revealed the upregulation of TGF-β signaling caused by the variant. Clinical features for the CS cases with FBN1 variants showed the expansion of the FBN1-related disease entity.
FBN1 encodes a connective tissue protein essential to extracellular microfibrils organization in dermal fibroblasts and skeletal muscle cells [34]. FBN1 is well known as a causal gene for MFS, which often has scoliosis without structural changes of the vertebrae and ribs. On the other hand, FBN1 mutations also cause various skeletal dysplasia, including Acromicric dysplasia (MIM #102370), Geleophysic dysplasia 2 (MIM #614185), Marfan lipodystrophy syndrome (MIM #616914), and Weill-Marchesani syndrome 2 (MIM #608328). These monogenic diseases have a broad range of vertebral phenotypes quite different from MFS. In a previous study, a novel FBN1 variant, p.Gly1796Glu co-segregated in a family with an autosomal dominant vertebral dysplasia featuring CS, in the absence of cardiac or ocular findings [35], indicating that FBN1 could be a disease gene for monogenic form of CS. Our case also confirmed that FBN1 causes CS as an autosomal dominant disease trait pattern.  The index case in our study was from an unusual threegeneration CS family segregating FBN1 LoF variant, c.2649G>A (p.Trp883Ter), which has been previously reported to be a pathogenic nonsense mutation associated with MFS [36], and deposited in HGMD (accession ID: CM161831). This nonsense substitution truncates the protein at codon 883, which is 1989 amino acids from the end of the protein. According to the ACMG guidelines for classifying pathogenic variants [37], this nonsense mutation is a convincing pathogenic LoF allele based on the following evidences: (a) PVS1: this nonsense variant is interpreted at the internal part of FBN1 gene where LoF is a known mechanism [38]; (b) PP1: cosegregation with CS phenotypes in the affected family members; (c)PP3: multiple computational evidence supports highly deleterious properties (Mutation Taster prediction shows deleterious; conservation prediction score by GERP is 4.81; CADD score is 42); (d)PP5: Franken et al. recently reported this variant as a pathogenic LoF allele associated with MFS, but the detailed evaluations for spinal/vertebral phenotypes were unavailable [36]. Notably, it has been widely reported that the high degree of clinical variability associated with FBN1 variants. For instance, distinct genetic mechanisms, encompassing a second deleterious variant in another gene or a polygenic model involving modifier loci, are proposed to account for this clinical variability [39]. Thus, the occurrence of a pathogenic variant in FBN1 is not necessarily associated with or produces MFS. Consistently with our findings, some FBN1 nonsense mutations (c.284C>A, p.Ser95Ter; c.1347_1348dupTA, p.Thr450IlefsTer130) only caused isolated skeletal features, with no cardiac or ocular findings [40]. Pathogenic FBN1 variants cause MFS but can also be found in patients presenting with apparently isolated features. For instance, the c.1453C>T, p.(Arg485-Cys) mutation in FBN1 has been identified in both autosomal dominant and recessive diseases characterized by a high extent of phenotypic variability [41,42]. Hence, it is reasonable to postulate that the c.2649G>A allele, which has been associated with MFS, could also account for the isolated CS in our study.
FBN1 is implicated in vertebral growth and development [43]. Fibrillin microfibrils in the calf are ubiquitously distributed in the vertebral growth plate and involved in growth factor binding, playing a pivotal role in regulating bone growth [43]. Moreover, a murine model with an intragenic duplication of Fbn1 presented markedly enlarged lumbar vertebra and elongated costal cartilages [44]. Functional follow-up of Fbn1 Tsk mouse unveiled the interactions of FBN1 with the Wnt signaling pathway [45] and its pivotal role for somitegenesis and vertebrate segmentation [4]. Since CS is induced by localized changes in vertebral body development [46], we proposed that factors distorting the normal localization process of the spine column may contribute to the pathogenesis of CS, such as FBN1.
TGF-β plays critical roles in development and maintenance of the skeleton [47]. The two unrelated patients (XH73 and XH579) sharing the p.Leu871Phe variant presented CS with vertebral segmentation defects, rib anomalies, and diastematomyelia. Our in vitro experimental results showed upregulation of the TGF-β signaling by the p. Leu871Phe variant, supporting the pathogenicity of the mutation. TGF-β signaling plays a critical role in cartilage and spine tissue development, upregulation of the TGF-β signaling pathway within the intervertebral disc could cause the occurrence of vertebral fusions [48]. The observations suggest that the upregulation of the TGF-β signaling pathway confer susceptibility to CS. In prior reports, the excess TGF-β activation and signaling have been frequently observed in FBN1-deficient humans and mice [29,49].
Regarding a mutational burden test for CS with a complex disease trait, the most important thing is not to focus disproportionately on specific variants, but rather to integrate across all types of risk-associated variants. In some individuals, risk may be caused by an unusual combination of common variants and rare variants, like the TACS compound inheritance model [6][7][8], whereas in others it may be due to an accumulation of rare or ultra-rare variants. In this context, it seemed actionable to stratify the FBN1 variants by the mutation types and CADD C-score prior to detecting the association by the collapsing method [25]. The knowledge about the variants associated with CS and the variable genetic penetrance will blaze the trail for genetic counseling and evaluating of the relatives of the variant carriers.
In conclusion, our observations indicated that FBN1 may be a susceptibility gene for CS, expanding our knowledge to the existing list of predisposition and candidate genes in CS phenotypes. It remains unknown why distinct mutations in the FBN1 gene can convey a variable continuum of skeletal disorders, ranging from the overlapping features to discordant phenotypes. Future studies focusing on exploring the underlying mechanisms of pleiotropic effects of FBN1 may be warranted.
Author contributions ML and NW performed the research, analyzed and interpreted the data. ML and NW drafted the manuscript. SZ, GL, YZH, CXY, YXZ, SL, JQL, YYY, YPC, XY, and BDT helped sample collection. LLW, YQZ, and SRW performed phenotyping of patients. ZHY and XZY helped analysis and interpretation of ES data. YCN and XXL provided technique support. YPW, SYZ, ZW, SI, GXQ, and JGZ offered professional discussions and instructions. JZS, JY, and HQZ helped bioinformatic analyses. ZHW and NW conceived and designed the study, revised the manuscript, and provided final approval of the manuscript.

Compliance with ethical standards
Conflict of interest The authors declare that they have no conflict of interest.
Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons. org/licenses/by/4.0/.