Noncoding variants alter GATA2 expression in rhombomere 4 motor neurons and cause dominant hereditary congenital facial paresis

Hereditary congenital facial paresis type 1 (HCFP1) is an autosomal dominant disorder of absent or limited facial movement that maps to chromosome 3q21-q22 and is hypothesized to result from facial branchial motor neuron (FBMN) maldevelopment. In the present study, we report that HCFP1 results from heterozygous duplications within a neuron-specific GATA2 regulatory region that includes two enhancers and one silencer, and from noncoding single-nucleotide variants (SNVs) within the silencer. Some SNVs impair binding of NR2F1 to the silencer in vitro and in vivo and attenuate in vivo enhancer reporter expression in FBMNs. Gata2 and its effector Gata3 are essential for inner-ear efferent neuron (IEE) but not FBMN development. A humanized HCFP1 mouse model extends Gata2 expression, favors the formation of IEEs over FBMNs and is rescued by conditional loss of Gata3. These findings highlight the importance of temporal gene regulation in development and of noncoding variation in rare mendelian disease.

Hereditary congenital facial paresis type 1 (HCFP1) is an autosomal dominant disorder of absent or limited facial movement that maps to chromosome 3q21-q22 and is hypothesized to result from facial branchial motor neuron (FBMN) maldevelopment. In the present study, we report that HCFP1 results from heterozygous duplications within a neuron-specific GATA2 regulatory region that includes two enhancers and one silencer, and from noncoding single-nucleotide variants (SNVs) within the silencer. Some SNVs impair binding of NR2F1 to the silencer in vitro and in vivo and attenuate in vivo enhancer reporter expression in FBMNs. Gata2 and its effector Gata3 are essential for inner-ear efferent neuron (IEE) but not FBMN development. A humanized HCFP1 mouse model extends Gata2 expression, favors the formation of IEEs over FBMNs and is rescued by conditional loss of Gata3. These findings highlight the importance of temporal gene regulation in development and of noncoding variation in rare mendelian disease.
The noncoding human genome contains cis-regulatory elements (cREs) that can be bound by transcription factors (TFs) and act as cell-type-specific enhancers or silencers to define complex gene regulatory programs 1-3 . Recent advances have revealed that cRE variants may cause rare disease 4-6 ; however, determination of the precise mechanism is difficult due to the need to study cREs in their relevant cellular and temporal context. Such studies are particularly challenging for developmental disorders where the fate of a small number of progenitors is defined by dynamic transcriptional states [7][8][9][10][11][12][13] .
HCFP1 is a rare autosomal dominant disorder of absent or limited facial movement that was mapped to a 3-cM region of chromosome 3q21. 2-22 (refs. 14,15). Neuropathology revealed a decreased number of FBMNs and facial nerve hypoplasia 16 . Sequencing of genes in the critical region, including GATA2, did not identify pathogenic coding variants 17 .
In the present study, we report that HCFP1 results from noncoding variants within a cell-type-specific GATA2 regulatory region. We identified two adjacent clusters of noncoding SNVs that alter a conserved cRE (cRE2) and overlapping tandem duplications of cRE2 and the adjacent GATA2 enhancers, cRE1 and cRE3. We demonstrate that one cRE2 SNV cluster impairs binding of nuclear receptor subfamily 2 group F member 1 (NR2F1; COUP-TF1) and attenuates its repressive activity in a cell-specific manner. We show that GATA2, and its downstream effector GATA3 (refs. 18,19), are necessary to differentiate rhombomere 4 motor neurons (r4MNs) to IEEs but are dispensable for FBMN development. By contrast, a humanized cRE1 duplication mouse has ectopic expression of Gata2 in developing FBMNs and this phenotype is rescued by genetically ablating Gata3. This mechanism highlights the importance of tight temporal control of TF expression in a cell-type-specific manner during development and supports whole-genome sequencing (WGS) to identify noncoding variation underlying rare Mendelian disorders. Article https://doi.org/10.1038/s41588-023-01424-9 nucleotides (Fig. 1c). Six SNVs are absent from gnomAD and other public databases, including chr3:128,178,298G>A, which appears to have risen independently in Fam9 and Fam14 (Extended Data Fig. 1b). By contrast, Fam7 and Fam8 share a rare ancestral haplotype flanking chr3:128,178,297A>G (Extended Data Fig. 1b), a variant present in six gnomAD v.3.1.2 individuals (rs987263273, minor allele frequency = 4 × 10 −5 ). Although Cluster A variants were fully penetrant, Cluster B variants in Fam7, Fam14 and possibly Fam6 had reduced penetrance.

HCFP1 facial weakness is a neurogenic disorder
We examined a subset of participants to determine whether SNVs and duplications resulted in similar phenotypes. Among the 37 variant-positive participants with detailed phenotypic documentation, 2 were clinically unaffected and 4 had mild weakness but considered themselves unaffected ( Fig. 1a and Supplementary Table 1). These six individuals all harbor SNVs, suggesting that SNVs can cause a milder phenotype. Among the 35 participants with visible facial weakness, 83% (29 of 35) had bilateral weakness, which was typically asymmetrical with regard to both sidedness and upper versus lower face, and facial nerves (cranial nerve VII) were hypoplastic on magnetic resonance imaging (MRI; Fig. 2a-q). Electromyography, nerve conduction studies, blink studies, acoustic stapedial reflex testing and auditory brainstem response studies were consistent with facial nerve neuropathy in the seven participants tested (Supplementary Clinical Note and  Supplementary Tables 1 and 2). Thus, HCFP1 is neurogenic 16 and both SNVs and duplications cause nonsyndromic, mild-to-moderate severity CFP, supporting a shared neurodevelopmental mechanism.

Variants alter cREs within a GATA2 regulatory region
All five CNVs duplicate highly conserved noncoding regions that we refer to as cRE1, cRE2 and cRE3 located 3′ of GATA2 and flanking DNAJB8. All seven SNVs are located within cRE2 (Fig. 1b,c and Extended Data Fig. 2a). GATA2 encodes a pleiotropic TF that regulates numerous genes critical for embryonic development and neuronal cell fate 25,26 and haploinsufficiency results in blood and immune disorders. Multiple cREs contribute to regulation of GATA2 expression in the blood, kidney and brain 27,28 . Among these, cRE1 and cRE3 function as enhancers and drive β-galactosidase expression in mice in a pattern recapitulating native Gata2 expression, including in r4 of the developing hindbrain 29 . Examination of published data 1,30,31 (Extended Data Fig. 2b) reveals that GATA2, but not DNAJB8, is transcribed in many cell types. The cRE1-3 overlaps with regions of chromatin open only in neuroblastoma cell lines, where GATA2 is also transcribed. Published chromatin immunoprecipitation sequencing (ChIP-seq) experiments in neuroblastoma lines show binding of GATA2 and GATA3 to cRE1 and cRE3, but not cRE2 (Extended Data Fig. 2c) 1,32 . These data highlight co-regulation and cell-type specificity of cRE1-3 and support them as part of a GATA2 regulatory region in human neuroblastoma cell lines and in mice 29,33 .

Tandem duplications and noncoding SNVs at the HCFP1 locus
We enrolled families and simplex cases with nonsyndromic congenital facial paresis (CFP, cohort 1 US-based study) and performed genome-wide single-nucleotide polymorphism (SNP) analysis and whole-exome sequencing (WES) in two large dominant pedigrees, family 1 (Fam1) and family 9 (Fam9; Fig. 1a). SNP-based multipoint parametric linkage analysis assumed autosomal dominant inheritance and full penetrance yielded maximum lod (logarithm of odds) scores suggestive of linkage at an overlapping 63-Mb chr3 region encompassing the previously reported HCFP1 locus 14,15 (Fig. 1b and Extended Data Fig. 1a). WES analysis did not identify pathogenic coding variants within the suggestive regions of linkage in either family. To identify HCPF1 variants, we performed WGS from members of Fam1, Fam9 and seven additional HCFP pedigrees in cohort 1 (two vertical, one horizontal transmission and four simplex cases). Structural variation analysis 20 revealed 31-kb and 20-kb overlapping tandem duplications within the HCPF1 locus in Fam1 and Fam2 (de novo), respectively (Fig. 1a,b and Extended Data Fig. 1b,c). We next analyzed WGS for SNVs or indels (insertions and deletions) within the Fam1/Fam2 ~18-kb minimum duplication region. Fam3, Fam7 and Fam9 each harbored a unique SNV within an ~270-bp, noncoding, conserved element (chr3:128,178,158-128,178,397; GRCh37/hg19). We resequenced and conducted double droplet PCR (ddPCR) of this element in the remaining cohort 1 probands: 2 pedigrees with vertical transmission, 4 sibling pairs and 31 simplex cases. SNVs were identified in dominant Fam4 and Fam8 and simplex Fam5 (de novo) and Fam6 (Fig. 1a,c and Extended Data Fig. 1d-f).
Cohort 2 (Europe-based study) included the two pedigrees that originally defined the HCFP1 locus 14,15 , in whom we identified a 23-kb tandem duplication in Fam10 and an SNV in Fam14. WGS analysis of 14 additional probands in cohort 2 (4 vertical, 2 horizontal, 2 unknown transmission and 6 simplex cases) identified variations that segregated with affected individuals in three dominant pedigrees: duplications were detected in Fam11 and Fam12 and an SNV was detected in Fam13 ( Fig. 1a-c and Extended Data Fig. 1b,e,f).
Gata2 is expressed in r4 as early as E8. 5 (ref. 19) and has been proposed to work through Gata3 to regulate IEE and FBMN development under the control of HOXB1 (refs. 19,42-45). We found that expression of Isl1, a crucial determinant of motor neuron identity 46 , marked both developing r4MNs and the stream of caudally migrating FBMNs (Fig. 3a,b). Gata2 expression overlapped with Isl1 in r4 and was prominent in parasagittal stripes of interneurons 19 but absent from migrating FBMNs (Fig. 3b).
The mouse facial nerve innervates large, extrinsic muscles that displace the whisker pad and small, intrinsic muscles surrounding each vibrissal follicle 51 . To examine facial nerve function, we developed a semiquantitative whisking assay, collecting high-speed video recordings of vibrissal movement as mice ran on a treadmill, and scored left and right whisker movements (Fig. 3j). Gata2 KO/flox ;Phox2b-Cre+ and Gata3 tlz/flox ;Phoxb2-Cre + mice showed full and indistinguishable whisking from WT ( Fig. 3k and Supplementary Videos 1a-c). Thus, Gata2 and Gata3 are master regulators of IEE but not FBMN development.

WT but not mutant cRE2 silences cRE1 and cRE3 in FBMNs
As HCFP1 duplications and SNVs cause the same phenotype in humans and cRE1 and cRE3 are Gata2 enhancers in mice 29 , we hypothesized that cRE2 was a cell-type-specific Gata2 silencer 13,52 . If so, SNVs could weaken the silencing by attenuating TF binding and duplications could disrupt regulatory balance. Either would cause abnormal Gata2 expression. To test this hypothesis in vivo, we evaluated whether different cRE combinations drove β-galactosidase expression when coupled to a lacZ reporter targeting a specific locus in the mouse genome 53 . We designed donor DNA constructs containing different cRE combinations (Fig. 4a). The cRE1 alone drove β-galactosidase expression in the region of r4MN precursors and migrating FBMNs, as well as in midbrain and spinal cord (Fig. 4b,c and Extended Data Fig. 3a), similar to published data 29 . The cRE3 alone drove expression restricted to r4MNs, lateral r4 where migrating IEEs and nascent FBMN/IEE axons overlap, and migrating FBMNs ( Fig. 4d and Extended Data Fig. 3b). Thus, although cRE1 and cRE3 enhance β-galactosidase expression in a Gata2 pattern, they also mark Gata2-negative migrating FBMNs. By contrast, cRE2 alone did not  drive β-galactosidase expression, consistent with silencing activity ( Fig. 4e and Extended Data Fig. 3c). Combining cRE2 with cRE1 or cRE3, we detected β-galactosidase expression in r4MNs and migrating IEEs but no longer in migrating FBMNs, consistent with absence of Gata2 expression in these cells (Figs. 3b and 4f,g and Extended Data Fig. 3d,e). The cRE2 with Fam3-5 Cluster A SNVs, when combined with cRE1 (cRE1 + cRE2*A), no longer attenuated cRE1-driven lacZ signal in migrating FBMNs, indicating that these SNVs prevented cRE2-mediated silencing ( Fig. 4h and Extended Data Fig. 3f). The effect of cRE1 with the three Cluster B SNVs (CRE1 + CRE2*B) was less clear, because the signal was attenuated in only one of eight embryos tested ( Fig. 4i and Extended Data Fig. 3g). It is interesting that expression of cRE2-mutant clusters alone (cRE2*A or cRE2*B) showed some neuronal signal only in tandem, not single, transgenic embryos (Extended Data Fig. 3h,i). Similarly, cRE1 + cRE2*A showed an overall stronger and more intricate lacZ pattern compared with cRE1 + cRE2 (Extended Data Fig. 3d,f). Overall, these in vivo data support our hypothesis that HCFP1 SNVs disrupt a cell-specific regulatory element (cRE2) that normally downregulates Gata2 expression in developing FBMNs.

Cluster A SNVs attenuate binding of NR2F1 to cRE2
We performed in silico prediction of TF-binding sites conserved between the cRE2 of humans and that of mice 54 . Cluster B SNVs were not predicted to alter conserved TF-binding sites. By contrast, Cluster A SNVs alter three nucleotides (5′-AGGTCA-3′) of a consensus sequence of the COUP-TF family, NR2F1 and NR2F2 (Fig. 4j) 55 . Nr2f1 is a determinant of cell-type specification and temporal fate of the developing cortical neurons and glia 55 . It is expressed throughout the hindbrain by E8.5 and enriched in facial and other cranial motor nuclei by E9. 5 (refs. 56,57). Re-analysis of published ChIP-seq data from human induced pluripotent stem cell-derived cranial neural crest cells 58 , which share a similar origin with neuroblastoma cells, revealed NR2F1 binding to cRE2 but not cRE1 or cRE3 (Extended Data Fig. 2c). NR2F2 did not bind cRE2 in human cranial neural crest cells 59 . Notably the mouse, but not the human, cRE1 sequence contains a COUP-TF-binding site (mm10 chr6:88,226,527-88,226,549). This, together with a murine-specific 4-bp deletion between cRE2 Clusters A and B (Fig. 1c), suggests differential cRE1-cRE3 binding and function of COUP-TF in the two species.
We performed an electrophoretic mobility shift assay (EMSA) that both confirmed interaction of NR2F1 with cRE2 sequence and demonstrated attenuated interaction with HCFP1 Cluster A variants in vitro ( Fig. 4k and Extended Data Fig. 4a-f). To evaluate the effect of cRE2 Cluster A SNVs in vivo, we generated a knockin mouse carrying the Fam5 SNV (Extended Data Fig. 5a). Fam5 snv/snv mice (chr6:88,224,892A>G) were viable and fertile and had normally developed facial motor nuclei and whisking (Fig. 3k, Supplementary Video 1d and Extended Data Fig. 5b-e). Despite the absent phenotype, conservation between mouse and human Cluster A sequences led us to test whether NR2F1 bound to WT Cluster A in r4MNs in vivo and whether the Fam5 SNV disrupted this interaction.
We dissected and FAC-sorted green fluorescent protein-positive (GFP + ) cells from the r4 hindbrain of E10.5 WT;Isl1 MN -GFP and Fam5 snv/snv ; Isl1 MN -GFP embryos, in which GFP specifically labels motor neurons 60 , and performed single-cell CUT&Tag 61 using an anti-NR2F1 antibody (Fig. 5a,b). We detected specific binding of NR2F1 to WT cRE1, cRE2 and, to a lesser extent, cRE3. By contrast, Fam5 snv/snv r4MNs showed reduced cRE2 peak height compared with WT, without change in cRE1 and cRE3 peaks (Fig. 5c). Together, this shows that NR2F1 binds cRE2 in vitro and in r4MNs, and Cluster A SNVs attenuate this binding.

Mice heterozygous for a humanized cRE1 duplication have HCFP
We generated a human cRE1 duplication mouse by inserting tandem copies of the human cRE1 sequence between mouse cRE1 and cRE2 (Extended Data Fig. 5f). We chose this approach because the cRE1 NR2F1-binding site in mice but not humans could alter the mouse pheno type. Mice heterozygous for the human cRE1 duplication (cRE1 dup/+ ) were viable and fertile, and had absent whisker movement consistent with HCFP1 ( Fig. 3k and Supplementary Video 1e).
Informed by known cell identity markers and those identified in the present study, we merged data from both genotypes, classified 16 clusters on the Unifold Manifold Approximation and Projection (UMAP) plot and found that clustering and cell-cycle phase were similar between the two genotypes (  Table 3). Clusters 1-6 defined a developmental trajectory of r4MNs comprising mitotic progenitors of r3-r7 neurons (Cluster 1) through to bipotent r4MNs (Cluster 4) that gave rise to IEEs (Cluster 5) and FBMNs (Cluster 6) ( Fig. 6a-c and Extended Data Fig. 6c,d). Cluster 5 IEE cellular density was increased whereas Cluster 6 FBMN cellular density was decreased in cRE1 dup/+ embryos compared with WT ( Fig. 6a-c). Dnajb8 was not expressed in any clusters of either genotype (Extended Data Fig. 6c,d).

GATA2 localization is expanded in developing cRE1 dup/+ r4MNs
We used multichannel immunofluorescent staining of IEEs and FBMNs in E10.5-E16.5 r4-r6 hindbrain sections to determine whether changes in r4MN organization supported a WT IEE-to-FBMN developmental switch that was altered in cRE1 dup/+ embryos. We focused on E14.5, when the broad contours of IEE and FBMN organization are first apparent and Gata2 is not yet downregulated (Fig. 7, single channels in Extended Data Fig. 8).
In WT embryos at E10.5, FBMNs (defined as ISL1 ON ;GATA2 OFF ; GATA3 OFF ) were distinguishable from IEEs (defined at this age as ISL1 ON ;GATA2 ON with variable GATA3 expression and at later ages as ISL1 ON ;GATA2 ON ;GATA3 ON ) (Extended Data Fig. 9a,b). By E12.5, FBMNs formed dorsal clusters flanking the r4 midline, whereas GATA2 and GATA3 delineated smaller ventral populations of IEEs that were migrating laterally and ventrally to form the OCN nucleus. Bilateral columns of ISL1 OFF ;GATA2 ON ;GATA3 ON interneurons were detected between the midline r4MN clusters and developing IEEs 43 and NR2F1 expression was elevated in FBMNs and reduced in IEEs (Extended Data Fig. 9c-n). At E14.5, IEEs formed variably detected dorsal VEN clusters and more prominent ventral OCN clusters (Fig. 7a,b). FBMNs   Fig. 9o-r).
In cRE1 dup/+ embryos at E10.5, GATA2 and GATA3 expression extended ectopically throughout r4MNs (Extended Data Fig. 9a,b). By E12.5, most r4MNs had adopted an 'IEE' molecular identity with many ectopically occupying the dorsal region of r4, and FBMNs expressed NR2F1 but were reduced at the r4 midline compared with WT (Extended Data Fig. 9c-n). At E14.5, OCNs occupied normal positions in the ventral hindbrain but also extended caudally into r6 and a larger population of ectopic 'IEEs' occupied positions in the dorsal hindbrain in the region of WT VENs (Fig. 7e,f). Ectopic 'FBMNs' were scattered throughout r4 and also formed a hypotrophic facial nucleus that extended from r4 to r6 (Fig. 7e-h; schema in Fig. 7i-k). At E16.5, the cRE1 dup/+ ventral OCN cluster extended ectopically into r6, the dorsal ectopic IEEs formed an expanded VEN cluster and the facial nucleus appeared small to absent (Extended Data Fig. 9o-r).
We quantified ectopic cell positions and changes in r4MN gene expression caused by cRE1 duplication by determining the size and position of ISL1 ON ;GATA2 ON IEE and ISL1 ON ;GATA2 OFF FBMN subpopulations in E14.5 WT and cRE1 dup/+ hindbrains. The average number of r4-born motor neurons did not differ between genotypes (Fig. 7l). However, although WT embryos generated a 1:9.3 ratio of IEE:FBMN cells, the cRE1 dup/+ embryo ratio was 1:1.3, with the number of IEEs adopting an OCN and VEN identity increasing over threefold and tenfold, respectively (Fig. 7m,n). Last, cRE1 dup/+ embryos had a 32% decrease in FBMNs (Fig. 7m) and, although 92% of E14.5 WT FBMNs completed migration into ventral r6, only 37% of cRE1 dup/+ FBMNs had, with the balance assuming ectopic positions in r4-5 (Fig. 7o).
As Dnajb8 lies between cRE1 and Gata2, we evaluated it as an HCFP1 target gene. In situ hybridization with Dnajb8 riboprobe revealed no expression in developing WT or Cre1 dup/+ hindbrain, whereas staining with Isl1 and Gata2 probes recapitulated protein antibody staining (Extended Data Fig. 10a-c). These observations are consistent with scRNA-seq data and confirm that changes in Dnajb8 expression are unlikely to underlie HCFP1.
These data establish that the humanized duplication of cRE1 perturbs r4-derived MN expression of Gata2 but not Dnajb8. They provide evidence of an IEE-to-FBMN birth order, with a developmental switch active from E9.25 to E10.5 in WT embryos that extends beyond E11.0 in cRE1 dup/+ embryos, producing IEEs at the expense of FBMNs. The 73% reduction in FBMNs correctly positioned in the caudal hindbrain in E14.5 cRE1 dup/+ embryos probably underlies their facial paralysis.  Loss of Gata3 in cRE1 dup/+ mice partially rescues CFP If cRE1 duplication results in the HCFP1 phenotype by causing ectopic expansion of Gata2 in r4MNs, then removal of Gata2 from cRE1 dup/+ mice should rescue the phenotype. Linkage disequilibrium prevented crossing the cRE1 dup allele on to the Gata2 KO/flox ;Phox2b-Cre + cKO background. As Gata3 is a Gata2 transcriptional target and conditional removal of Gata2 or Gata3 eliminates IEE generation but preserves FBMNs (Fig. 3), we tested whether conditional Gata3 deletion would rescue the cRE1 dup CFP phenotype.
We evaluated whisking after conditional removal of Gata3 from cRE1 dup/+ mice. Six of seven cRE1 dup/+ ;Gata3 tlz/flox ;Phox2b-Cre + mice had variable and often asymmetrical rescue of whisking, ranging from subtle movement in subsets of whiskers to complete restoration of whisking ( Fig. 3k and Supplementary Video 1f-h). Comparison of E14.5 histologies revealed that conditional removal of Gata3 from cRE1 dup/+ embryos eliminated the large r4 ectopic population of dorsal ISL1 ON (and ISL1 ON ;GATA2 ON ) cells as well as IEEs, and generated an elongated column of FBMNs that extended into ventral r6 to form a structure closer in size and shape to the facial nucleus seen in WT controls (Fig. 8a-j). These data establish that human cRE1, in concert with cRE2 and cRE3, modulates the Gata2-Gata3 axis that defines the IEE-to-FBMN switch, and human HCFP1 pathogenic variants probably alter this regulatory pathway (Fig. 8k).

Discussion
We report that heterozygous noncoding SNVs and CNVs at the HCFP1 locus alter regulation of GATA2 and account for >90% of autosomal domi nant, nonsyndromic CFP. Remarkably, the SNVs alter six nucleotides located in two clusters within a conserved noncoding region that we refer to as cRE2, located 3′ of DNAJB8 and GATA2. DNAJB8 is not a triplosensitive gene (pTriplo score 0.22) (ref. 62) nor is it expressed in r4MNs or surrounding tissue in WT or cRE1 dup/+ mice, excluding its involvement in HCFP1. Instead, our data support cRE2 as a tissue-specific regulatory element to which NR2F1 binds, restricting r4MN GATA2 expression to developing IEEs.
The importance of Gata2 expression in an r4MN IEE-to-FBMN fate transition and the perturbation of its spatial and temporal hindbrain expression as the cause of HCFP1 are supported by our data and those of others. First, we established GATA2 and GATA3 as essential regulators of IEE fate and dispensable for FBMN development and migration. Second, we found that Gata2 enhancers cRE1 and cRE3 drive reporter expression in migrating FBMNs where Gata2 is not expressed and cRE2 silenced this expression. Moreover, this silencing is attenuated by HCFP1 SNVs. Although the cRE2 silencing mechanism remains unknown, cRE1-3 and Gata2 are within the same regulatory region and the cREs might compete for binding to the Gata2 promoter. Third, our humanized cRE1 duplication mouse model has CFP, and scRNA-seq and histology revealed ectopic Gata2 expression in later-born cRE1 dup/+ r4MNs that expanded the IEE and depleted the FBMN populations. This phenotype could be partially rescued by removal of Gata3. Last, monoallelic loss-of-function variants in GATA2 and in the +9.5-kb blood GATA2 enhancer element cause blood and immune dysfunction without facial weakness 63,64 , consistent with altered, not reduced, GATA2 expression in HCFP1 and highlighting the importance of tissue-specific regulation.
Several lines of evidence support a cell-type-specific function of NR2F1 in r4MN IEE-to-FBMN fate transition and attenuation of this function in HCFP1. First, we demonstrated that NR2F1 binds to cRE1 and cRE2 in WT r4MNs, and binding to cRE2 is reduced in r4MNs isolated from mice carrying a Cluster A SNV. Second, we found dynamic expression of Nr2f1 in developing FBMNs, with reduced expression in IEEs. Third, although human haploinsufficiency of NR2F1 causes a variable phenotype characterized primarily by intellectual disability and optic nerve degeneration 65 , several individuals are reported to have a thin facial nerve or mild facial weakness 66,67 .  components of a E9.5-E12.5 scRNA-seq object comprising Isl1 + and/or Hoxb1 + FAC-sorted Isl1 MN -GFP cranial motor neurons (MNs) (with GFP − cells spiked in) spanning r3-r7. Seurat clusters are numbered and annotated according to proposed cellular identity at the right. CN, cranial nucleus. The black dotted arrows trace the proposed pseudotime developmental trajectory of r4MNs from mitotic progenitors of r3-r7 neurons (Cluster 1), r4MN mitotic progenitors (Cluster 2) and r4MN precursors (Cluster 3), 'bipotent r4MNs' (Cluster 4), which gave rise to separate populations of IEEs (Cluster 5) defined by Gata2 and Gata3 expression 18,19 , and FBMNs (Cluster 6) defined by Syt4, Shox2 and Cdh8 expression and enriched for Nr2f1 (refs. 18,19,74,75) (Extended Data Fig. 6c,d). c, Overlapping feature plots of WT (blue, bottom layer) and cRE1 dup/+ (peach, top layer) 3D UMAPs shown in a and b. Sixty percent opacity of cRE1 dup/+ data points reveals WT data and highlights overlap of the genotypes (burgundy). d, Volcano plot of differential expression analysis between WT and cRE1 dup/+ r4MN trajectories across the E9.5-E12.5 timepoints. Circled genes display log(foldchange) > 1 and −log 10 (FDR) > 200 or are additional genes of interest (where FDR is false recovery rate). e, Dotplot comparison of FBMN and IEE marker expression in E9.5-E10.5 Cluster 1-6 r4MN developmental trajectories in WT (upper) and cRE1 dup/+ (lower) embryos. Red and green outlines highlight differences in Syt4 and Gata2 expression, respectively, between WT and cRE1 dup/+ samples. Scales indicate the mean expression level and percentage expressing cells within each cluster. f, Feature plots of WT and cRE1 dup/+ r4MN trajectory determinants and markers at E9.5 (upper two rows) and E10.5 (lower two rows). At E9.5 in both WT and cRE1 dup/+ embryos, r4MN precursors, a subset of IEE-directed bipotent r4MNs and IEEs (Clusters 3-5), expressed Gata2, with additional ectopic expression seen in cRE1 dup/+ FBMNs (Cluster 6). By E10.5, WT embryos expressed Gata2 only in Cluster 5 IEEs, but cRE1 dup/+ embryos maintained Gata2 expression in Clusters 3-5. g, Density plots for Nr2f1 and Gata2 expression in E9.5-E10.5 WT and cRE1 dup/+ r4MNs. See also Extended Data Fig. 6. Article https://doi.org/10.1038/s41588-023-01424-9 We favor NR2F1 over NR2F2 as key to the IEE-to-FBMN switch. We found no evidence that NR2F2 binds to cRE2 in public databases 59 , and it shows low expression in developing r4MN, despite being upregulated in lateral FBMNs at late embryonic stages 38 . NR2F2 appears important for metabolic and cardiac processes 68 rather than neuronal development 69 and NR2F2 haploinsufficiency in humans is associated with congenital heart defects without reports of facial weakness 70 .
It is of interest that we did not observe a CFP phenotype in the Cluster A Fam5 SNV/SNV mice, despite alterations in NR2F1 binding. HCFP1 SNV variants are less penetrant than CNVs, and the Fam5 SNV mouse may cause a perturbation too mild to cause CFP. It is also possible that the nonconserved NR2F1-binding site in mouse cRE1 attenuates the role of cRE2 in mouse r4MNs. Finally, introduction of cRE2 SNVs in our lacZ assay unveiled enhancer activity, probably through oppor tunistic binding of other TFs, which could vary between mice and humans 71 .
We do not know the mechanism of Cluster B SNVs. In silico analysis predicted few if any TF consensus sequences in the Cluster B WT sequence. By EMSA, Cluster B SNVs did not alter NR2F1 binding and had less effect on β-galactosidase reporter expression. Loss of a nonconserved TF-binding site in Cluster B that acts in concert with    3 (a,b), 3 (c,d), 4 (e,f), 3 (g,h) and 5 (i,j) embryos). k, Model depicting the effect of HCFP1 variants. Stage 1: in both WT (left side) and HCFP1 (right side) hindbrains, early born r4MN progenitors express Gata2, driven in part by cRE1 and cRE3 enhancers, and assume an IEE identity (red cells). Stage 2: in WT, NR2F1 (pink oval) binds to cRE2 in later-born r4MNs, silencing GATA2 and directing these cells to an FBMN identity (gray cells). In HCFP1, cRE2 SNVs disrupt NR2F1 binding (demarcated with X) and unimpeded cRE1 and cRE3 enhancers drive GATA2 expression in later-born r4MNs. Duplications of cRE1, cRE2 and cRE3 generate a net increase in GATA2 enhancer level, similarly expanding GATA2 expression. Either will increase IEEs at the expense of FBMNs, deplete the FBMN progenitor pool and result in CFP.

Article
https://doi.org/10.1038/s41588-023-01424-9 NR2F1 could result in the indistinguishable Cluster A and Cluster B SNV pheno types. Alternatively, COUP-TFs recruit co-factors to leverage their inhibitory activity 55,72 and aberrant binding of TFs to mutant Cluster B could attenuate NR2F1 function through steric hindrance or impaired cooperative binding 73 .
In summary, our results show that cell-type-specific Gata2 expression is critical for development of r4 IEEs and its subsequent downregulation drives a fate switch to FBMNs. This transition is tightly regulated by binding of TFs, including NR2F1, to the FBMN-IEE-specific regulatory elements cRE1, cRE2 and cRE3. HCFP1 noncoding variants alter this regulatory framework by pathologically prolonging Gata2 expression, favoring the formation of IEEs at the expense of FBMNs.

Online content
Any methods, additional references, Nature Portfolio reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at https://doi.org/10.1038/s41588-023-01424-9.

Methods
Additional methods information can be found in Supplementary Information. Data were excluded from the study only if rendered uninterpretable for technical reasons, including damage to cryosections that precluded quantification. In these instances, a replicate sample was processed and included in the study. For scRNA-seq, one E9.5 dataset was excluded from the study due to high free RNA content and the experiment was repeated to generate a usable dataset.

Research participants
For Adult participants and guardians of children provided written informed consent for participation. No participant compensation was provided. The NIH paid travel and visit expenses for participation in the NIH Clinical Center evaluation. Photographs were selected from participants who consented to publication of identifying two-dimensional face photographs. Sex, number and age of participants are provided in Supplementary Table 1. Phenotypes of the affected members were obtained through a visit to the NIH Clinical Center or through examinations conducted by co-authors. A blood and/or saliva sample was collected from each participant for extraction of genomic DNA.

Clinical evaluation
Multidisciplinary phenotyping studies were performed prospectively during a 1-week visit to the NIH Clinical Center for the 12 participants indicated in Supplementary Table 1. Studies included standardized examinations by clinical genetics, ophthalmology, audiology, dental/ craniofacial, rehabilitation medicine, speech therapy, neuro logy, cardiology, neurocognitive and behavioral testing, as well as brain imaging, neurophysiology and laboratory studies, per protocol NCT02055248. Additional details are provided in Supplementary Methods.

Whole-genome sequencing
WGS was performed and interpreted independently for the two cohorts. Additional details are provided in Supplementary Methods.

Targeted sequencing and variant validation and haplotypes
The cRE2-conserved noncoding region on chromosome 3 was amplified with KAPA2G Fast ReadyMix (KAPA Biosystems) and Sanger sequenced bidirectionally (Genewiz). SNV confirmation and segregation were evaluated in all available family members by Sanger sequencing. Alignment of the electropherograms was performed using Geneious Prime v.2021.1.1 (Dotmatics). Screening by ddPCR was performed for CNV screening in the conserved chromosome 3 region and DNAJB8. The hTERT (catalog no. 4403316) or RNaseP probes (Thermo Fisher Scientific, catalog no. 4403326) served as an internal copy number control. CNVs were confirmed using breakpoint spanning PCR when possible. All primers and probes are listed in Supplementary Table 4. Additional details are provided in Supplementary Methods.

Mouse husbandry
Animal husbandry was according to NIH guidelines and approved by the Institutional Animal Care and Use Committees of Boston Children's Hospital (protocol no. 00001852), the Icahn School of Medicine at Mount Sinai (protocol no. 2015-0052) and the Lawrence Berkeley National Laboratory (protocol nos. 290003 and 290008). Breeding pairs were separated after the detection of a vaginal plug at 9am, which was considered to be E0.5. The sex of the experimental embryos was not determined.

Experimental mouse lines
Generation and acquisition of transgenic mouse lines, breeding strategies for experimental crosses and species, strain, sex, number and age of experimental animals are described in Supplementary Methods.

LacZ assay
Transgenic E11.5 mouse embryos were generated and analyzed as described previously 80 . Additional details are provided in Supplementary Methods.

Whisker movement assay
Mice aged 4 weeks to 5 months (20 males, 31 females) of the indicated genotypes were recorded in the.MOV format with the 'Slo-Mo' function on an iPhone v.6 (which records at ~120 frames per s) while walking on a treadmill. Each video recorded the superior view of the mouse's face Article https://doi.org/10.1038/s41588-023-01424-9 and body and was at least 2 min in length at the decreased frame rate. After a training session to standardize interpretation, four independent reviewers blinded to mouse genotype reviewed the unedited videos using Apple QuickTime Player (v.10.5) and scored left-side and right-side whisker movement on a scale of 0-3: '3' indicated the full trajectory of all whiskers as observed in WT mice, '2' indicated a slight reduction in range of motion or in number of whiskers moving, '1' indicated a dramatic reduction in range of motion or in number of whiskers moving and '0' indicated no detected whisker movement. Statistical analysis was performed using unpaired, two-sided Wilcoxon's testing. For presentation as a supplementary video, recordings were cropped, enlarged and edited for length in iMovie 10.3.5 (Apple, Inc.) for representative examples of treadmill walking 8-12 s in duration. Videos were 'cropped to fit' in iMovie to enlarge and focus on the head. Video segments were compiled into a single video file, with annotations generated in Microsoft 365 PowerPoint and imported as separate slides with iMovie.

Dissection and dissociation of embryonic r4 motor neurons
ISL1 MN -GFP + and surrounding GFP − tissues were microdissected from E9.5, E10.5, E11.5 and E12.5 WT, and cRE1 dup/+ hindbrains. To capture the anatomical extent of lateral IEE and caudal FBMN migration, the developing hindbrain from the caudal edge, trigeminal motor nucleus through the rostral third of the glossopharyngeal/vagus nuclei was collected. Single-cell suspensions were generated from dissected hindbrain tissue with enzymatic digestion and trituration (Papain Dissociation System, catalog no. LK003150) (ref. 81).

FACS
GFP + cranial motor neurons were collected from single-cell suspensions of dissociated embryonic hindbrains using a BD FACSARIA II Cell Sorter equipped with BD FACSDiva 8.0.2 software and a 100-μm nozzle. Isl1 MN -GFP r4MNs were selected based on GFP reporter expression and found to comprise 2-6% of the total cellular input. Immediately before completion of Isl1 MN GFP + cell sorting, GFP gates were lifted to sample a representative spike of GFP − cells from the surrounding tissues and to reach an optimal number of total cells for the 10× protocol. These cells were collected into a single well of a 96-well plate containing 5 μl of 0.4% bovine serum albumin (BSA) in Hibernate E Low Fluorescence medium (HE-Lf, Brainbits).

ScRNA-seq
ScRNA-seq was performed using the Single Cell 3′ Reagent kits v.3.1 User Guide (10× Genomics). The resulting libraries were sequenced on a NextSeq500 platform (Illumina). Additional details are provided in Supplementary Methods.

Immunohistochemistry and in situ hybridization
Timed litters from crosses of WT female C57/Bl6 mice to cRE1 dup/+ males were collected at E10.5, E12.5, E14.5 and E16.5, cryosectioned and processed for immunofluorescent staining as described previously 38 , using combinations of primary antibodies against ISL1, GATA2, GATA3 and ISL1, NR2F1 and GATA3. Similar E10.5, E12.5 and E14.5 litters, as well as testes from WT and cRE1 dup/+ adult males, were collected, cryosectioned and processed for in situ hybridization as described previously 85 using riboprobes for Isl1 and Gata2. Whole-mount E11.5 embryos were collected from WT crosses and processed for in situ hybridization as described previously 86 using the Isl1 and Gata2 riboprobes. Additional details are provided in Supplementary Methods.

Histological examination of r4MN identity, migration and birthdate
For examination of r4MN migration, cell identity and birthdate, WT female C57/Bl6 mice were crossed to cRE1 dup/+ males and received single 50 mg kg −1 of intraperitoneal injections of EdU (Thermo Fisher Scientific, catalog no. A10044) at E9.25, E10 or E10.5 development timepoints. E14.5 embryos were dissected, fixed, cryosectioned, collected on to glass slides, immunostained with guinea-pig anti-ISL1 and rabbit anti-GATA2 primary antibodies, incubated with Alexa Fluor-488 anti-guinea-pig and Alexa Fluor-647 anti-rabbit secondary antibodies, processed for EdU detection using azide-conjugated Alexa Fluor-555 and coverslipped. The methods used are as described previously 38 . Sections were imaged on a Zeiss LSM 980 confocal microscope with a ×20 objective and a 3-μm step size. For each embryo, bilateral ISL1 ON r4MNs were analyzed caudally to rostrally, beginning at the first section rostral to the hypoglossal nucleus and ending at the first section in which IEEs were no longer present (at the level of the trigeminal motor nucleus). Cells from every fourth cryosection were counted semiautomatically in three dimensions using arivis Vision4D ×64 analysis operations. Additional details are provided in Supplementary Methods.

Cell count statistical analysis
Statistical analysis and all plotting were performed using Rstudio build 554 and R v.4.2.1 with tidyverse package v.1.3.1. Statistics was calculated using unpaired, two-sided Student's t-test using the function Stat_ compare_means from the ggpubr 0.4.0 package.

Birthdating statistical analysis
The average unilateral number of r4MNs labeled by single EdU injections at E8.5, E9.25, E10.0 and E10.5 was determined as above and in Supplementary Methods. The proportions of EdU-labeled IEEs and FBMNs were calculated by dividing the number of cells labeled from each population by the total number of EdU-labeled r4MNs detected for each embryo and averaging these percentages. Statistical significance was defined by P < 0.05 from an unpaired, two-sided Student's t-test, calculated and plotted using R v.4.2.1.

Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability
Publicly available ChIP-seq datasets used in the present study: accession nos. GSM1817193 and GSM714811 for NR2F1; GSM714812 for NR2F2; GSM935589 for GATA2; and GSM1010738 and GSM1602667 for GATA3. Conserved TF-binding sites were obtained using rVista 2.0 (https://rvista. dcode.org). Additional epigenetic data were explored using the ENCODE database (https://www.encodeproject.org). GRCh37/hg19 human reference genome under Sequence Read Archive (SRA) accession no. PRJNA31257 and GRCm38/mm10 mouse reference genome under SRA accession no. PRJNA20689 were used for the alignment of human and mouse sequencing data, respectively. GnomAD and 1,000 Article https://doi.org/10.1038/s41588-023-01424-9 genome frequencies were extracted from https://gnomad.broadinstitute.org and https://www.internationalgenome.org, respectively. Common structural variant data were obtained from the DGV (http:// dgv.tcag.ca/dgv/app/home) and GoNL SV database (https://www. nlgenome.nl/login). Exome sequence and SNP data from a subset of participants are available through dbGaP Phs001383.v1.p1. WGS data from Cohort 1 participants are available through dbGaP Phs001247.v1.p1; Radboudumc consent does not allow for broad sharing via repositories and, thus, Cohort 2 WGS data are available on request and after a positive evaluation by a local data access committee confirming that the proposed re-use is in line with original consent obtained. ScRNA-seq and CUT&Tag sequencing data are available through the National Center for Biotechnology

Code availability
The codes used for scRNA-seq and single-cell CUT&Tag data processing and analyses are available at https://zenodo.org/badge/latestdoi/ 637923997.

T T C A G A G A G C C C A A G C C A C T G A A G C A A T A G C T T C T C C T T T
chr3:128,194,582 chr3:128,174,929 Family 11