Human myelomeningocele risk and ultra-rare deleterious variants in genes associated with cilium, WNT-signaling, ECM, cytoskeleton and cell migration

Myelomeningocele (MMC) affects one in 1000 newborns annually worldwide and each surviving child faces tremendous lifetime medical and caregiving burdens. Both genetic and environmental factors contribute to disease risk but the mechanism is unclear. This study examined 506 MMC subjects for ultra-rare deleterious variants (URDVs, absent in gnomAD v2.1.1 controls that have Combined Annotation Dependent Depletion score ≥ 20) in candidate genes either known to cause abnormal neural tube closure in animals or previously associated with human MMC in the current study cohort. Approximately 70% of the study subjects carried one to nine URDVs among 302 candidate genes. Half of the study subjects carried heterozygous URDVs in multiple genes involved in the structure and/or function of cilium, cytoskeleton, extracellular matrix, WNT signaling, and/or cell migration. Another 20% of the study subjects carried heterozygous URDVs in candidate genes associated with gene transcription regulation, folate metabolism, or glucose metabolism. Presence of URDVs in the candidate genes involving these biological function groups may elevate the risk of developing myelomeningocele in the study cohort.

Successful neural tube formation relies on a series of orchestrated biological processes to facilitate convergent extension of neural ectodermal cells (NE) on the neural plate. NE cells at the midline and the dorsolateral regions form the medial hinge point (MHP) and the dorsolateral hinge point (DLHP) through mechanisms including apical construction and localized cell proliferation, and proliferation of non-neural ectodermal cells (NNE) that facilitate closing of the neural tube 1 (Fig. 1). These biological processes involve sensing signals in the extracellular matrix by signal receptors on the cell surface and sub-organelles (e.g. mechanochemical sensors on primary cilia) which subsequently trigger remodeling of cytoskeletal structures and orchestrate directional migration of NE and NNE cells 2 . The neural tube can fail to close due to impaired convergent extension, MHP or DLHP formation, structural integrity of NE, or midline fusion of NE or NNE 3 . Maintaining NNE integrity and controlling epithelial-to-mesenchymal-transition of neural crest cells 4 , and balancing proliferation of hindgut cells 5 (of endodermal origin) may also play important roles in the development of neural tube. Synchrony of all these processes together transform the neural plate into neural folds that meet and merge to form the neural tube. Factors interrupting the structure and/or function and spatial temporal integrity of these synchronous processes could result in failure of neural tube closure 1,3 . More than 400 genes have been identified in animal models that are required to maintain normal neural tube closure [6][7][8] . Many of the neural tube defect (NTD) animal model www.nature.com/scientificreports/ study. The majority of the homozygous defective mouse genes caused open neural tubes in the cranial regions whereas heterozygous mice had normal neural tube development 9 . Interestingly, some mice with a subset of these genes in the digenic heterozygote condition developed spina bifida, suggesting that oligogenic heterozygosity is a possible mechanism for spina bifida development 17 . Recent human spina bifida studies revealed cases carrying only heterozygous damaging variants of PCP genes and examples of subjects with digenic heterozygote rare variants in some PCP genes were observed 13 . In addition, some cases had damaging variants in two PCP genes each inherited from one of the parents 12 . Oligogenic inheritance is a demonstrated disease mechanism in many human diseases including the birth defect with holoprosencephaly 18 . Representation of oligogenic heterozygous URDVs in two or more NTD candidate genes found in MMC subjects in the study will be reported. Ontology analysis suggested URDV-containing NTD candidate genes were enriched for the same ontology group or in multiple categories including cilium, cytoskeleton, extracellular matrix, WNT signaling, and cell migration. The potential contribution of the observed results to MMC development in humans will be discussed.  Table 2). These variants were defined as ultrarare functionally deleterious SNVs (URDVs) and used for further analysis. Each study subject had approximately 29 URDVs with range between 9 and 90.

Results
Eighty-eight subjects were re-sequenced using the Illumina NGS platforms. Of the 30,789 SNVs in 33 EA and 37,924 SNVs in 55 MA identified from the Ion Proton platform in this study, over 98.5% were also identified by the Illumina NGS platforms.
Landscape of URVs in NTD candidate genes in human MMC exomes. There are several hundred genes known to cause NTDs in mice under monogenic and oligogenic conditions 3,9 . We sought insight through examining SNVs in 551 NTD candidate genes (see Supplementary Table 1) known to cause abnormal neural tube closure in animal models (537 genes), and 14 genes associated with human MMC in our study cohort. A total of 2134 SNVs in 440 genes were identified from EA MMC exomes and 2496 SNV in 447 genes were identified from MA MMC exomes.
Ultra-rare functional deleterious SNVs (URDVs) in candidate genes. Many studies have suggested the impact of variants is inversely proportionate to the alternate allele frequencies (aAF) 14 . Using aAF of variants in NFE/AMR gnomAD (controls) exomes as a filter, all 506 MMC subjects carried 31-69 rare deleterious variants (RDVs, aAF ≤ 0.01) in NTD candidate genes whereas the number of URDVs per subject varied between zero and eight (see Fig. 2 Table 3). With increased findings of novel and very rare variants in PCP genes associated with human NTDs in many previous studies 3, 19 , we chose to focus on the novel URDVs, which are more likely to have higher effect size, for subsequent analyses.

& Supplementary
Among 254 EA MMC subjects there were 307 URDVs in 202 genes. On average, each URDV-containing gene in EA MMC group had 1.52 URDVs. The range of URDVs per gene found in EA MMC was zero to six. Among 252 MA MMC subjects, there were 331 URDVs found in 203 genes. An average of 1.63 URDVs in a URDVcontaining NTD candidate gene was identified in MA MMC group. The range of URDVs per gene found in MA MMC subjects was zero to nine. Over 97% of URDVs had an allele count (AC) of one and a few had two to three. The majority (~ 95%) of URDVs were missense changes while the remaining ~ 3.4% were stop_gained or splice site changes (1.3% and 2.4% in EA and MA groups respectively). Around 70% of MMC subjects in both ethnic groups had at least one URDV in an NTD candidate gene, with the highest URDV counts being eight and five in EA MMC and MA MMC subjects, respectively (Fig. 2). There were 78 EA MMC subjects and 67 MA MMC subjects who had no URDVs in NTD candidate genes. URDV-containing NTD candidate genes were either common to both ethnicities or found uniquely in EA MMC and MA MMC exomes. There were 99 genes unique to EA MMC, 100 genes unique to MA MMC and 103 genes were common to both ethnic groups. Examining EA MMC and MA MMC respectively, most (66.7% and 63.4%) of the genes had one URDVs; 21.3% and 22.1% had two; 10.9% and 7.4% had three and the remaining few had four to nine URDVs. For the entire study cohort, 636 URDVs with 603 missense, 21 stop_gained and 12 splice site changes in 302 NTD candidate genes were discovered (Supplementary Table 1). These URDVs were manually verified using the online Integrative Genomics Viewer (IGV) Web App (https ://igv.org/app). Seven subjects carried two heterozygous URDVs in their candidate genes (Supplementary Table 3 www.nature.com/scientificreports/ Nine genes in MA MMC subjects had a higher number of URDVs, ranging from four to nine, with the most in FREM2. In EA MMC subjects, eight genes had four to six URDVs with six in APOB and in FREM2. Three genes (EP300, HSPG2, and FREM2) found in both EA and MA MMC subjects had more than ten URDVs with FREM2 and EP300 had an URDV density around 1.5. In this study, the number of EA MMC subjects is 254 and the number of MA MMC subjects is 252. Approximately 25-30% subjects had no URDVs in NTD candidate genes, around 35% had one URDV, and the remaining had more than one. (b) Shows counts of URDVs discovered per NTD candidate gene ranged from one to nine with nearly 65% genes containing only one. Total number of URDV-containing NTD candidate genes found in EA and MA MMC subjects are 202 and 203 respectively. www.nature.com/scientificreports/ www.nature.com/scientificreports/ Ontology Enrichment of NTD candidate genes with URDVs in MMC exomes. Among the 551 NTD candidate genes, enrichment of components in cilium, cytoskeleton, extracellular matrix and cell migration/adhesion, and WNT signaling were suggested (Supplementary Table 4). Using the ToppCluster 20 online program, genes between EA and MA groups consisting of URDVs were compared to the group of genes lacking URDVs using Bonferroni correction for multiple testing. Analysis results showed genes in EA and MA groups had higher folds of enrichment in ontology classes than those of the overall 551 candidate genes. However, there was very distinct overrepresentation of subclasses of ontology for URDV-containing candidate genes in the subjects and these subclasses were absent among the candidate genes without URDVs ( Fig. 4 and Supplementary Table 4). Enrichment of cellular component genes containing URDVs was seen mostly among ciliary components and intraciliary transport, actin filament, and microtubule cytoskeleton organization in both ethnic groups (Fig. 4a, and Supplementary Table 4). URDV-containing genes in EA MMC subjects were associated with components found in neuronal cell body, perinuclear region of cytoplasm, ciliary transition fiber, and WNT/SHH signaling. URDV-containing genes in MA MMC subjects were associated with components of ciliary basal body actinbased cytoskeleton and cell projection and cell-cell junctions. Differences in cellular component genes with URDV between EA and MA subjects were observed. Molecular function analysis showed enrichment of URDV-containing candidate genes with molecular functions in transcription modulation, beta-catenin binding, cell adhesion, and cytoskeleton binding in both EA and MA MMC exomes (Fig. 4b, and Supplementary Table 4). EA MMC cases had enriched candidate genes related to hedgehog family protein and WNT signaling molecule binding. MA MMC cases had enriched candidate genes involved in DNA binding with transcription factors and nuclear hormone receptors. Together, nearly half of the MMC subjects in the study cohort had one or more URDVs in the candidate genes belonging to the ontological groups associated with ciliary structures and functions.
Both EA and MA MMC exomes consistently had URDV-containing genes disproportionately represented in the Hedgehog/WNT signaling and cancer signaling pathways (Fig. 4c, and Supplementary Table 4). Of note, EA MMC exomes uniquely enriched with URDVs in genes from WNT/PCP, SHH and TGFβ signaling pathways. MA MMC exomes were uniquely enriched with URDVs in genes related to thyroid hormone signaling, NOTCH signaling, and anchoring ciliary basal body.
Most of the NTD candidate genes involve biological processes of embryo development of organs particularly for the central nervous system and other organ systems (Supplementary Table 4). Ontology terms for biological processes are also consistent with those described in the cellular components and molecular functions sections (Fig. 4d).
Ciliary structure and functions are closely associated with GO terms for cytoskeleton/microtubules, extracellular matrix, WNT signaling, and migration of cells and cellular components. To further dissect the properties of URDV-containing NTD candidate genes and to examine the URDV distribution in MMC exomes, we performed analysis of genes assigned to these five GO terms (Supplementary Table 5). Candidate genes in each of these ontological groups consisted a range of 12-23.2% of the total URDVs (Table 1). Together, nearly half of the MMC subjects in the study cohort have one or more URDVs in the candidate genes belonging to the five ontological groups. There were 113 and 110 NTD candidate genes with URDVs respectively in the EA and MA subject groups that were associated with functions for cilium, ECM, cytoskeleton and WNT signaling and cell migration. Of these genes, some belong in only one GO term and many in two to four GO terms with cross-interacting roles (Fig. 5, and Supplementary Table 6). Of 364 subjects in the study cohort who bore at least one URDV in an NTD candidate gene, nearly 75% had URDVs affecting genes associated with cilium, WNT signaling, ECM, cytoskeleton and cell migration remodeling suggesting they may confer increased genetic risk to MMC development. The remaining 25% of MMC subjects had URDV-containing NTD candidate genes that were associated with other gene functions such as transcription modulation, folate one carbon metabolism network and glucose oxidative stress 21 not related to the ontologies examined in this study.
Subjects had multiple URDV-containing genes affecting one or multiple ontologies. Recently published articles demonstrated the genetic contribution in individuals affected with NTDs that were potentially digenic heterozygote with de novo and/or rare deleterious variants inherited from each parent 12 . In this study, nearly 30% of MMC subjects had two or more URDVs in different NTD candidate genes belonging to the same and/or different ontologies. Overall, 85 EA MMC and 91 MA MMC exomes had multiple URDV-containing genes classified with different ontologies (Supplementary Table 3). Approximately 40% of these subjects had URDV-containing NTD candidate genes representing various combinations of the five ontologies presented above.
Oligogenic heterozygote URDVs involving the same ontology. A subset of MMC subjects including 25 EA and 21 MA were carrying two to three URDV-containing candidate genes that belong to the same ontology group ( Table 2). Nine EA MMC and four MA MMC subjects had URDVs found in two candidate genes associated with cilia structure/function ( Table 2). Of note, many cilium gene products also play roles in one or more of the other four ontologies examined here. Two EA MMC and five MA MMC subjects had URDVs in two genes with functions associated with WNT signaling (Table 2). Eleven EA MMC and nine MA MMC subjects had URDVs found in two genes with functions associated with cytoskeleton structures and functions (Table 2). Three EA MMC and one MA MMC subjects had URDVs found in two or three genes with functions associated with ECM structure and functions genes (Table 2). Finally, seven EA MMC and 10 MA MMC subjects had URDVs found in two or more candidate genes regulating cell migration ( www.nature.com/scientificreports/   Table 2). In addition, over 80% of the 302 URDV-containing NTD candidate genes caused embryonic lethality on or before neural tube closure in knockout mice of these genes 9 .

Discussion
This study revealed that 70% of the 506 MMC subjects consisting of the two ethnic groups with the highest prevalence of myelomeningocele in North America carry ultra-rare deleterious variants (URDVs) in 302 genes previously demonstrated to cause NTD phenotypes in animal models 9 or associated with human NTDs (Supplementary Table 2). Ontology enrichment analyses showed around 50% of subjects in the cohort have one or more URDV-containing NTD candidate genes potentially impairing the structure and/or function of one or more ontological groups including cilium, SHH and WNT signaling, remodeling ECM and cytoskeleton, and cell migration. Normal structure and function of these genes are necessary for successful closure of the neural tube in animals 1,3 . The study results shown here may have a high impact on our understanding of the genetic mechanisms of human MMC. One-fifth of MMC subjects in the study had URDVs in NTD candidate genes affecting cilium structure and/ or function, suggesting human myelomeningocele risk is very likely associated with ciliopathy. The extent of URDV-containing ciliopathy genes associated with human NTD cases is not known. A review article by Vogel et al. (2012) noted that some human syndromes such as Meckel-Gruber syndrome and Joubert Syndrome that present with NTDs were associated with ciliopathies and recommended genetic screening of ciliopathy genes   . DYNC2H1-HTT). It is likely that two genes within the same sub-functional group could contribute to digenic impairment of ciliary structure and/ or functions critical to neural tube closure increasing the risk of MMC in these cases. The PCP pathway modulates cellular functions such as ciliogenesis, actomyosin cytoskeleton and microtubules remodeling that are critical to neural tube formation 1,31-36 . Normal ciliogenesis is important for noncanonical WNT PCP signaling 37 . Many studies showed defects in cilia can lead to canonical WNT signaling over-activation 29 . Furthermore, some evidence suggests cilium is a potential regulator (molecular switch) between canonical and non-canonical WNT signaling 38 . Neural tube formation involves biomechanical mechanisms resulting in mediolateral convergence and rostro-caudal extension of neural plate and axial tissues 1 . The convergent extension process depends on normal non-canonical WNT/PCP pathway activity. WNT/PCP activity influences remodeling of the cytoskeleton that drives events such as reshaping cells and bending the neural plate. Deleterious genetic variants disrupting normal function of core PCP maintenance genes (e.g. CELSR1, DACT1, SCRIB and VANGL2) has been associated with ~ 20% of craniorachischisis cases and 8% of spina bifida cases 3,39 . The extent of non-canonical WNT/PCP gene variants identified from subjects affected by various types of NTDs was recently published 19 . This study identified 7% of total MMC subjects with one or more URDVs affecting the WNT/PCP signaling pathway and 3-4% affecting core PCP pathway genes involved in neural tube closure. A handful of WNT signaling genes were shown to have higher mutational burden in the current study cohort 40 . Two URDVs [i.e. VANGL1 p.(V239I) and LRP6 p.(Y544C)] in two Mexican American subjects in the study cohort had been shown to cause functional loss of the proteins 41,42 . Discovery of heterozygous digenic deleterious variants in human NTD cases with spina bifida (i.e. PTK7/ SCRIB) and anencephaly (i.e. CELSR1/ SCRIB; CELSR1/DVL3) strongly support the strategy of screening PCP genes to discover risk alleles contributing to human NTDs 3,12 . Here, we identified seven new digenic combinations of WNT/PCP signaling genes with three of these combinations (i.e. CELSR1-LRP6, SMURF2-LRP6, and PTK7-VANGL1) involving the core PCP pathway. Of these, three subjects had URDVs potentially interfering with regulation of planar polarity establishment (GO:0090175). These new gene combinations suggest previously unknown gene-gene interaction candidates for validation with animal studies in the future.
Knowledge of the extent of genetic risk of cytoskeleton structure function components to human MMC is limited. A quarter of the study subjects have one or more URDVs in 65 NTD candidate genes coding for cytoskeleton components to make microtubule structures: actomyosin filaments forming cell projections such Table 1. Distribution of URDVs and genes in five biological ontology groups among MMC subjects. Note. Count of URDV, or neural tube defect candidate gene, or subjects with myelomeningocele are shown with the percentage to the total count of each column in bracket. Some genes are classified to more than one category. Count of URDVs, or gene or individual are presented follow by the percentage of the count presented in bracket. Some genes can be assigned to more than one ontology categories. A unique subtotal count shows the subtotal number of the URDV, or gene, or subject without double counting when a gene has multiple ontology groups assigned. Not in above -represent URDVs, genes or subjects outside the five ontologies. URDVultra-rare deleterious variant, EA-European American subject, MA-Mexican American subject. ECMextracellular matrix. Nearly 15% of the subjects in this study had URDVs in NTD candidate genes coding for ECM components. The ECM compartment serves as a microenvironment retaining growth factors and signaling molecules (e.g. SHH, BMP, WNT, EGFs and FGFs) to recruit or influence neuroepithelial cells to determine shape, size, polarity and migration status for neural tube closing 28 . The roles of ECM components on regulating the position of the nucleus, cell shape, proliferation, migration, and morphogenesis of neural tube has been well-discussed 1 . Loss of function in some ECM components (e.g. Frem2 and Hspg2) caused NTDs in animal models 9 . This study is the first to reveal URDVs in FREM2, which were present in around 3% of study subjects, and URDVs in HSPG2, which were present in 2% of study subjects. Pathogenic variants in FREM2 have been associated with Cryptophthalmos (#123,570) and Fraser syndrome 2 (#617,666) with skull abnormalities or encephaloceles 44 . Digenic heterozygotes in two to four ECM genes were present in eight subjects in this study.
URDVs in cell migration genes were present in over 20% of the study subjects with many of these genes also involved in cilia, cytoskeleton, ECM, and WNT signaling. Ventral-dorsal migration of cells into the neural fold and rapid cell proliferation contribute to bending of the dorsolateral spinal neural plate, facilitating neural tube closure 45 . The ct mouse mutant has increased hindgut cell proliferation due to abnormal Grhl3 expression at the caudal region which surpasses NE and NNE migration and proliferation and prevents spinal neural tube closure 5,46 . Knockdown of the cell migration molecule Itgb1 prevents closure of the neuropore 47 . Mice with ablation of Rac1, a molecule modulating actin cytoskeleton remodeling and cell migration in NNE, developed open spina bifida, exencephaly or anencephaly 48 . In addition, impaired mesodermal cell migration due to diabetic pregnancy led to NTDs in mouse 49 . Seventeen MMC subjects in this study had oligogenic heterozygous URDVs in genes involving in cell migration providing examples for testing oligogenic heterozygote effects on cell migration as a mechanism of MMC development.
Knowledge of oligogenic gene-gene interaction between genes of different ontologies contributing to risk of myelomeningocele is lacking. Oligogenic inheritance in holoprosencephaly, a rare birth defect of the brain, was recently demonstrated 18 . Evidence of mouse embryos with oligogenic heterozygous PCP related genes knockout developed spina bifida suggesting a subject having heterozygous URDVs in two NTD candidate genes  www.nature.com/scientificreports/ www.nature.com/scientificreports/ www.nature.com/scientificreports/ belonging to the same ontology group may increase risk of MMC development 9,17 . So far, no non-syndromic human spina bifida cases with homozygous deleterious variants in one gene has been found, and only cases with digenic heterozygous rare deleterious variants in two PCP genes had been reported 12,13 . This study cohort showed examples of MMC subjects carrying multiple NTD candidate genes containing URDVs and one subject has two compound heterozygous URDVs in the KIF7 gene. Figure 6 summarized the potential impact of URDVs in NTD candidate genes identified from 506 MMC subjects. Hypothesis-free testing of random combinations of gene-gene interaction in animal models is too labor intensive and cost-prohibitive. While MMC risk for some subjects might be rationalized by disrupted genetic interaction between URDV-containing genes within the same ontology, subjects with URDV disrupted genes in different ontologies will be challenging. Here, the study results provide examples of new highly probable genetic interacting partners elevating MMC risk through an oligogenic Table 2. Subjects with multiple URDVs in candidate genes classified with one or more ontological groups. Note: EA-European American subjects, MA-Mexican American subjects. URDV-ultra-rare deleterious single nucleotide variant defined in Materials and Method section. Gene name-is the symbol of gene recommended by the Human Genome Organization Gene Nomenclature Committee (HGNC). Gene assigned into cilium, WNT-signaling, extracellular matrix (ECM), cytoskeleton and cell migration was described in Materials and Methods, and Supplementary Table 3. C-Score-Combined Annotation Dependent Depletion (CADD) score is a broadly applicable metric that objectively weights and integrates diverse information that integrates multiple functional annotations into one metric by contrasting variants that survived natural selection with simulated mutations. has not yet been demonstrated, it has been shown that inactivated canonical WNT signaling facilitated nuclear entry of miR-133a and complexed with AGO2 to suppress DnmT3b expression in mouse HL-1 cells 50 . Also, the presence of p.Y544C impaired LRP6 localization to plasma membrane for facilitating WNT signaling 42 . Subject D22-469 carries URDVs p.(N35K) and p.(H31R) respectively in SNX3 and CITED2, known components for WNT signaling and cell migration respectively. SNX3 binds WLS to regulate recycling of WLS thereby regulating the level of WNT excretion. Furthermore, it has been demonstrated that SNX3 with the subject's URDV p.(N35K) failed to interact with WLS affecting WNT sorting and secretion 51 . A recent report demonstrated the developmental defects of Cited2 morphants in Zebrafish could be rescued with exogenous WNT5A and WNT11 52 . This study identified URDVs in NTD candidate genes in 70% of MMC subjects lending strong support to the approach of screening NTD candidate genes to identify high impact genetic risk factors associated with human MMC development. Results of oligogenic URDV-containing cases present in this study provide a rational basis to test oligo-genic combinations in animal models to elucidate the mechanism(s) of MMC in humans. New oligogenic disease mechanism(s) for MMC development can be discovered from the new combinations of genes presented in this study and not previously known to cause NTDs. Testing genetic interactions between URDVcontaining NTD candidate genes discovered from MMC subjects to understand human MMC mechanism is sound and promising.
In the study, we observed the presence of URDV-containing candidate genes that were common to EA and MA, and unique to one or the other ethnic population. Many of the genes unique to EA or MA belongs to the same ontology subclasses and a small number of genes belong to distinct specific subclasses. Differences in subclass information may help to reveal the disease mechanisms for the subjects who carried these variants. The potential of utilizing the observed differences in these genetic variants to predict differential risks in different ethnic populations may require replication with additional studies.
The study results were limited to examining the impact of ultra-rare SNVs in the coding region and the splice sites of NTD candidate genes. The impact of URDVs in genes not known to cause NTDs in mice has not been evaluated. Yet there are a significant portion of genes which have important biological function and are detected in the developmental stages when the neural tube closes that can affect neural tube development. Importance of rare deleterious variants, out of frame indel variants, deleterious non-coding variants and copy number variants should be recognized. Previous genetic association studies demonstrated some impact of common and rare deleterious variants to NTD risk in humans but generally lack replication due to various limitations of association study designs, ethnic background, and sample size. This study cohort has similar limitations. Separate analysis using all the common variants identified from the subject exomes did not yield significant association with the selected NTD candidate genes after correcting for multiple testing. Figure 6. Summary of proportion of study subjects with URDV-containing NTD candidate genes constituting the components of cilium structure and function, WNT-signaling, cytoskeleton remodeling, extracellular matrix (ECM) remodeling and cell migration. URDVs were identified in the DNAs extracted from blood lymphocytes and expected to be present in all body cell types including both neural and non-neural ectodermal cells (NE and NNE respectively). Identifying the URDVs is the first of many steps leading to the understanding of the genetic mechanisms of human MMC development. Cross interaction of genes within and between the five groups is anticipated with many genes assigned to more than one of these groups. Presence of multiple genes with URDVs in one subject provide basis for testing potential gene-gene interaction that may interrupt cell proliferation, or convergent-extension, or apical constriction, or midline fusion or a combination of these processes leading to MMC. Exome sequencing and variant annotation. Exome library probes were made from an in-house design based on TargetSeq (Life Technologies, Inc.) with addition of splice sites, UTRs, small non-coding RNAs (e.g., microRNAs) and a selection of miRNA binding sites, and 200 bp promoter regions. High quality genomic DNA samples were processed using the exome library probes and the captured DNA products were sequenced following the manufacturer's standard protocol for multiplexed sequencing using the P1 chip on the Ion Proton platform (Life Technologies, Inc.). Quality of sequencing was maintained at 40-60 million reads with read length between 120 and 150 bases, and over 75% reads were on-target for all successfully sequenced samples. Other quality control measures were implemented to map 45-60,000 SNP per sample with ~ 50% heterozygote variants and the Ts/Tv ratio average of 2.5. Samples that failed to meet the above quality criteria were re-sequenced. If re-sequencing failed, another subject's DNA will be used to reach the goal of sequencing at least 500 subjects. For sequence data that passed variant level and sample level quality filters, variant calling was conducted using Genome Analysis Toolkit GATK HaplotypeCaller version 3.x following best practice guidelines 55 . Additional variant filtering steps had been described previously 21,40 . Briefly, only variants designated a "PASS" by Variant Quality Score Recalibration (VQSR), met map quality score < 20, and having inbreeding coefficient < -0.3 were retained for further analysis. Individual sample filters were used to ensure that only high-fidelity variants with alternate allele depth > 25%, a read depth > 10, and genotype quality score > 20 were analyzed. Allele count (AC), allele number (AN), and allele frequency (AF) were recalculated for individual ethnicities after the filtering processes. Filtered high quality single-nucleotide variants (SNVs) were annotated using the non-synonymous single-nucleotide variant functional predictions (dbNSFP) 56 version 4.0 database for all functional prediction information publicly available. Further analysis was focused on SNVs leading to stop_gained, stop_lost, nonsynonymous, splice donor, and splice acceptor site changes in the canonical transcripts.
Thirty-three EA subjects and 55 MA subjects in the study were also re-sequenced using Illumina NGS platform (i.e. NovaSeq). Raw sequence reads of WES were mapped to GRCh38 (hg38) and read for WGS were mapped to GRCh37 (hg19). Variants were called following the GATK best practice guidelines. Then the variant data generated was filtered using the same filtering criteria described above for the Ion Proton generated data. Filtered variant data of the 88 subjects generated by the Ion Proton platform used in the study were extracted and compared to filtered variant data generated by the Illumina Platform to verify variants with concordance.
Ultra-rare functional deleterious SNVs (URDVs) analysis. The alternate allele frequencies (aAFs) of variants reported for the exomes of the non-Finnish European (NFE) and Ad Mixed American (AMR) control populations in the genome aggregation database gnomAD v.2.1.1 (https ://gnoma d.broad insti tute.org/) were used to annotate the single nucleotide variants (SNVs) in the study 57 . Variants having ethnic aAF = 0 or absent in gnomAD v2.1.1 (controls) despite high location coverage in gnomAD were defined as ultra-rare SNVs (URVs). Variants having aAFs in NFE or AMR gnomAD v2.1.1 (controls) less than 0.01 were defined as rare, whereas aAF ≥ 0.01 were defined as common. Combined Annotation Dependent Depletion 16 (CADD; https :// cadd.gs.washi ngton .edu/) phred scores (C-score) of variants were used as the model to predict deleteriousness and variants with a C-score ≥ 20 (top 1% most deleterious) cutoff were defined as ultra-rare deleterious variants (URDVs) and retained for further comparison in the study. Genes with URDVs were further evaluated for potential association with MMC in the study cohort using the in silico function analysis tools. In addition, all URDVs identified in the NTD candidate genes in Supplementary Table 3 had been manually examined and confirmed using IGV Web App (https ://igv.org/app/) 58 .
Online variant analysis tools. Annotation of gene and function with Gene Ontology 59 (GO; http://geneo ntolo gy.org) terms were used to estimate the extent of enrichment for particular ontology in molecular function, biological process, cell components and pathways in each group of genes found with URDVs to evaluate their potential relevance to development of neural tube defects. Several web based tools were used including PubMed and Mouse Genome Database 9 ( http://www.infor matic s.jax.org/pheno types .shtml ) to compile a list of genes known to associate with neural tube defects (Supplementary Table 2) searching for terms: neural tube defects, open neural tube, spina bifida, anencephaly, exencephaly, craniorachischisis, and abnormal neural tube morphology. ToppFun (https ://toppg ene.cchmc .org/enric hment .jsp) and ToppCluster 20 (https ://toppc luste r.cchmc .org/) were used to evaluate ontology enrichment and for comparing multiple groups of genes (Chen et al. 2009). Genes in the GO terms were extracted using g:Profiler 60  www.nature.com/scientificreports/ to loss-of-function mutation (pLi score) and haploinsufficiency were annotated using resources from DatabasE of genomiC varIation and Phenotype in Humans using Ensembl Resources(DECIPHER; http://decip her.sange r.ac.uk) 61 . To annotate expression of the genes of interest in animal neural tube, a database for human neural tubes during Carnegie stages CS12 and CS13 22 (Krupp et al., 2008) was used. Genes expressed in mouse future spinal cord between Theiler Stages TS11-19 and early embryonic lethality mouse genes (MP:0008762) were extracted from MGI (http://www.infor matic s.jax.org/) for function annotation.
NTD candidate genes. Increasing numbers of published reports showed compound heterozygote deleterious variants of mouse NTD genes in human NTD subjects suggesting these variants as genetic risk factors for human NTDs. To facilitate variant containing gene prioritization, a list of 551 genes (Supplementary Table 2) was assembled from Mouse Genome Database 9 (http://www.infor matic s.jax.org/) using search terms: open neural tube (MP:00000929), exencephaly (MP:0000914), anencephaly (MP:0001890), craniorachischisis (MP:0008784), and abnormal neural tube closure (MP:0003720). Also included were RARA , RARG , KIAA0586 and KATNAL2 with evidence of open neural tube defects in mice, chicken, frog and human [62][63][64] . Mouse NTD mutants not assigned to a gene and genes assigned only to spina bifida occulta were not included in this study. In addition, a list of 14 genes were added because association has been established in our study cohort (Supplementary Table 2). Together, a total of 551 candidate genes were selected for the study.
Ethical compliance. Affected subjects and/or their parents were recruited for the research study with a consenting protocol approved by the University of Texas Health Science Center (UTHealth) at Houston Institutional Review Board.