The Human Accelerated Region HACNS1 modifies developmental gene expression in humanized mice

Morphological innovations that arose during human evolution are ultimately encoded in genetic changes that altered development. Human Accelerated Regions (HARs), which include developmental enhancers that harbor a significant excess of human-specific sequence changes, are leading candidates for driving novel physical modifications in humans. Here we examine the role of the HAR HACNS1 (also known as HAR2) in human limb evolution by directly interrogating its cellular and developmental functions in a humanized mouse model. HACNS1 encodes an enhancer with human-specific activity in the developing limb in transgenic mouse reporter assays, and exhibits increased epigenetic signatures of enhancer activity in the human embryonic limb compared to its orthologs in rhesus macaque and mouse. Here we find that HACNS1 maintains its human-specific enhancer activity compared to its chimpanzee ortholog in the mouse embryonic limb, and that it alters expression of the transcription factor gene Gbx2 during limb development. Using single-cell RNA-sequencing, we demonstrate that Gbx2 is upregulated in humanized limb bud chondrogenic mesenchyme, implicating HACNS1-mediated Gbx2 expression in early skeletal patterning. Our findings establish that HARs direct changes in the level and distribution of gene expression during development, and illustrate how humanized mouse models provide insight into regulatory pathways modified in human evolution. 2 . CC-BY-NC 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted December 12, 2019. ; https://doi.org/10.1101/2019.12.11.873075 doi: bioRxiv preprint


Introduction
The evolution of uniquely human physical traits, including changes in limb morphology that allow us to use tools and to walk upright or the expansion of the cerebral cortex, required human-specific genetic changes that altered development (Aiello and Dean, 1990;Carroll, 2003;Geschwind and Rakic, 2013;Reilly and Noonan, 2016). Discovering the locations of these changes and determining their biological impact is a major challenge. However, over the last decade comparative studies have begun to reveal potential genetic drivers underlying uniquely human morphological features. These efforts have identified a prominent class of elements in the genome that are highly conserved across many species but show a significant excess of humanspecific sequence changes (Capra et al., 2013;Lindblad-Toh et al., 2011;Pollard et al., 2006Pollard et al., , 2010Prabhakar et al., 2006). These elements, collectively named Human Accelerated Regions (HARs), are prime candidates to encode novel human molecular functions. Experimental studies revealed that many HARs act as transcriptional enhancers during embryonic development, particularly in structures showing human-specific morphological changes such as the brain and limb (Capra et al., 2013;Cotney et al., 2013;Kamm et al., 2013;Prabhakar et al., 2008;Reilly et al., 2015;Won et al., 2019). HARs have also been shown to exhibit human-specific changes in enhancer activity, both in transgenic mouse enhancer assays and in massively parallel reporter assays in cultured cells (Capra et al., 2013;Kamm et al., 2013;Prabhakar et al., 2008;Ryu et al., 2018;Uebbing et al., 2019). These findings suggest a critical contribution for HARs in human evolution and support the long-standing hypothesis that changes in gene regulatory programs during development contribute to evolutionary innovation (Britten and Davidson, 1971;King and Wilson, 1975;Wray, 2007).
Despite these advances, the role of HARs in altering developmental processes remains poorly understood. Reporter assays in transgenic mice and cultured cells have identified HARs that show differential enhancer activity compared to their chimpanzee orthologs, but cannot provide insight into developmental phenotypes. Similar approaches have been used to drive expression of HAR putative target genes in transgenic mice. For example, the potential impact of HARE5 on expression of its putative target gene FZD8 was assessed using transgenes in which either HARE5 or its chimpanzee ortholog drove expression of a mouse Fzd8 cDNA (Boyd et al., contribute to our understanding of HAR regulatory activity, they do not address how HARs function in native genomic contexts. Here we use a humanized mouse model approach to directly characterize the effect of human-specific sequence changes in HARs on gene expression and regulation during embryonic development. We chose to model HACNS1 (also known as HAR2 or 2xHAR.3), which exhibits the strongest acceleration signature of any noncoding HAR yet identified, with 13 humanspecific substitutions in a 546 base-pair interval (Fig. 1). HACNS1 was also the first HAR demonstrated to exhibit a human-specific gain in enhancer activity during development. In a mouse transgenic enhancer assay, HACNS1 was shown to drive increased expression of a LacZ reporter gene in the embryonic mouse limb compared to its chimpanzee and rhesus macaque orthologs ( Fig.1B; Prabhakar et al., 2008). Furthermore, HACNS1 exhibits increased levels of the histone modification H3K27ac, which is correlated with enhancer activity, in the human embryonic limb compared to rhesus macaque and mouse ( Fig. 1C; Cotney et al., 2013).
Together, these findings suggest that HACNS1 may have contributed to changes in limb development during human evolution.
We used homologous recombination to replace the mouse ortholog of HACNS1 with the human or chimpanzee sequence, enabling us to compare the functions of all three orthologs in the same developmental system. We found that HACNS1 maintains its human-specific activity in the mouse embryo, altering the expression of the nearby transcription factor Gbx2 in the developing limb bud. Through single-cell RNA-sequencing, we showed that the gain of function in HACNS1 alters gene expression in limb chondrocytes, suggesting it may affect skeletal morphogenesis. Our findings establish that HARs direct human-specific changes in gene expression during development, and illustrate how humanized mouse models provide insight into regulatory pathways and developmental mechanisms modified in human evolution.

Design and generation of the HACNS1 humanized mouse line
To generate a humanized HACNS1 mouse model, we designed a targeting construct for homologous recombination that included a 1.2 kb human sequence encompassing HACNS1, previously shown to encode human-specific enhancer activity in transgenic mouse embryos (Prabhakar et al., 2008). We replaced the orthologous mouse locus using homology-directed repair in C57BL6/J-A w-J /J (B6 agouti) embryonic stem (ES) cells (Fig. 1D, Fig. S1A,B; Materials and Methods). To provide a control that would enable us to identify bona fide human-specific functions of HACNS1, we used the same approach to generate a mouse model for the orthologous chimpanzee sequence. The 1.2 kb chimpanzee sequence includes 22 single nucleotide differences from the human allele, 15 of which are specific to the human lineage compared to other primates (See Materials and Methods).
In order to verify the integrity of the edited sequences, we sequenced a 40 kb region encompassing the human or chimpanzee sequence replacement, the homology arms used for targeting, and flanking genomic regions in mice homozygous for either HACNS1 (HACNS1 +/+ ), or the chimpanzee ortholog ( Fig. S1C; Materials and Methods). We found no evidence of aberrant editing, sequence rearrangement, or other off-target mutations at either edited locus. We also verified that each homozygous line carried two copies of the human or chimpanzee sequence using quantitative real-time PCR (qRT-PCR) (Fig. S1D). 6 . CC-BY-NC 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted December 12, 2019. The location of HACNS1 in the human genome (GRCh37/hg19), relative to the nearby genes AGAP1 and GBX2. (B) LacZ reporter gene expression (in blue) driven by HACNS1, the chimpanzee ortholog, or the rhesus macaque ortholog in embryonic day 11.5 (E11.5) transgenic mouse embryos. Images are adapted from Prabhakar et al. 2008. (C) The normalized H3K27ac signal in the developing limb bud at HACNS1 and the orthologous loci in rhesus macaque and mouse. Data were obtained from Cotney et al. 2013. The region of significant H3K27ac enrichment in human is shown by a black bar (top), and the location of HACNS1 is shown as a green bar. (D) Alignment of the human sequence used to generate the HACNS1 humanized mouse with orthologous sequences from other vertebrate genomes, obtained from the UCSC hg19 100way Multiz alignment (see Table S1 for coordinates). The chimpanzee orthologous sequence used to generate the chimpanzee control line is highlighted in olive, and the mouse sequence replaced in each line is highlighted in teal. The location of each human-specific substitution is shown by a red line, and the corresponding positions in the alignment are highlighted in yellow. The locations of HACNS1 (Prabhakar et al., 2006) and 2xHAR3 (Lindblad-Toh et al., 2011) are shown below the alignment.

HACNS1 exhibits human-specific enhancer activity in the humanized mouse embryo.
We used chromatin immunoprecipitation (ChIP) to determine if HACNS1 exhibits epigenetic signatures of increased enhancer activity in the humanized mouse model. We initially focused on the limb for epigenetic profiling, as the human-specific activity of HACNS1 was previously demonstrated in mouse embryonic limb in transgenic reporter assays and by a gain of H3K27ac in human embryonic limb (Cotney et al., 2013;Prabhakar et al., 2008). We profiled H3K27ac and H3K4me2, which are both associated with enhancer activity, in embryonic day 11.5 (E11.5) limb buds from HACNS1 +/+ embryos, embryos homozygous for the chimpanzee ortholog, and wild type embryos. We found a strong signature of H3K27ac marking at HACNS1 in HACNS1 +/+ limb (Fig. 2, S2). The chimpanzee and mouse sequences both showed significant but weaker H3K27ac enrichment than the human sequence, supporting the conclusion that HACNS1 maintains its human-specific enhancer activity in the humanized mouse model.
To quantify the differences in epigenetic marking we observed, we used DESeq2 implemented in HOMER (Materials and Methods; Heinz et al., 2010;Love et al., 2014) to identify genome-wide significant changes in H3K27ac and H3K4me2 levels in limb bud of mice homozygous for HACNS1 or the chimpanzee ortholog versus wild type. We found that H3K27ac and H3K4me2 levels at the edited HACNS1 locus were significantly increased in HACNS1 +/+ limb bud compared to wild type (Fig. 2, S2, Table S4). Although the chimpanzee locus also exhibited increased H3K27ac marking over input controls, the level in the chimpanzee locus was not significantly differentially increased compared to wild type (Fig. S2A, Table S4). The levels of H3K4me2 were significantly increased at both the humanized and the chimpanzee orthologous loci in each respective line compared to wild type ( Fig. 2, S2, Table S4). Increased levels of H3K4me2 coupled with low levels of H3K27ac are associated with weak enhancer activity, so it is likely that the chimpanzee sequence is not acting as a strong enhancer in the limb bud overall but may be active in other tissues , a finding further supported by gene expression analyses described in Figure 3 below. However, our results suggest that the chimpanzee ortholog may show stronger enhancer activity than the mouse sequence, pointing to potential primate-rodent differences in addition to the strong human-specific gain of function in

HACNS1.
Because previous transgenic mouse enhancer assays showed that HACNS1 also drives increased reporter gene expression activity in the pharyngeal arch (Prabhakar et al., 2008), we profiled H3K27ac and H3K4me2 in pharyngeal arch tissue from HACNS1 +/+ embryos, embryos homozygous for the chimpanzee ortholog, and wild type embryos at E11.5. We detected reproducible, significant enrichment of H3K27ac in the pharyngeal arch at the humanized and orthologous chimpanzee locus compared to input controls, but H3K27ac marking at the humanized locus was not significantly different than that at the wild type locus (Fig. S2). We did, however, identify a significant gain of H3K4me2 signals in the pharyngeal arch at the humanized and orthologous chimpanzee locus compared to wild type (Fig. S2, Table S4).
In order to identify downstream epigenetic changes resulting from HACNS1 activation in HACNS1 +/+ limb bud, we searched for other genome-wide gains of H3K27ac and H3K4me2 at enhancers and promoters. We identified a significant gain of H3K27ac in HACNS1 +/+ limb bud versus wild type at the promoter of the nearby gene Gbx2 (Fig. 2, S2, Table S4). While significant H3K27ac enrichment was found in all three lines at the Gbx2 promoter compared to input controls, H3K27ac levels were not significantly increased at Gbx2 in limb buds homozygous for the chimpanzee ortholog compared to wild type, indicating the gain of activity is specific to HACNS1. H3K4me2 was also enriched at the promoter of Gbx2 in all three lines compared to input controls.
Gbx2 encodes a transcription factor with multiple functions during development. GBX2 has been implicated in midbrain and hindbrain development (Li et al., 2005;Wassarman et al., 1997), guidance of thalamocortical projections (Chatterjee et al., 2012;Hevner et al., 2002;Miyashita-Lin et al., 1999), ear development (Lin et al., 2005;Miyazaki et al., 2006), and pharyngeal arch patterning (Byrd and Meyers, 2005). Gbx2 was previously found to be expressed in developing mouse limb; however, its role in limb development remains undetermined given the lack of a reported limb phenotype in Gbx2 knockout mice (Wassarman et al., 1997). Our findings suggest that Gbx2 is a regulatory target of HACNS1, and that HACNS1 gain of function may alter its expression in the humanized mouse limb.

HACNS1 drives spatial and quantitative changes in Gbx2 expression in the limb bud
To identify potential expression changes resulting from HACNS1-driven activation of the Gbx2 promoter in humanized mouse embryos, we used in situ hybridization (ISH). We found that HACNS1 +/+ E11.5 embryos showed increased Gbx2 expression in both forelimb and hindlimb compared to embryos homozygous for the chimpanzee ortholog and wild type embryos ( Fig. 3A). We observed an increase in Gbx2 expression in HACNS1 +/+ embryos in two distinct anterior and posterior regions in the forelimb and hindlimb bud, as well as an anterior proximal region in the latter. Overall, Gbx2 limb bud expression was temporally dynamic, decreasing from E11 to E12. We therefore established a fine staging scheme to characterize changes in Gbx2 expression within this short time interval. We used crown-rump length measurements to assign embryos to 6 temporally ordered groups (designated T1-T6, and ranging from approximately 36 to 43 somites; Tam, 1981) (Fig. 3A).
We identified substantial changes in the distribution of Gbx2 expression in the forelimb and hindlimb buds of HACNS1 +/+ embryos compared to both homozygous chimpanzee ortholog embryos and wild type embryos across all 6 developmental stages (Fig. 3, S3). At the earliest time point, we found that Gbx2 was strongly expressed in distinct anterior-distal and posterior domains in HACNS1 +/+ forelimb and hindlimb bud (Fig. 3, S3). Robust expression of Gbx2 in HACNS1 +/+ limb buds persisted throughout the remaining timepoints, though the size of the anterior and posterior domains decreased over time. We also observed that strong expression of Gbx2 in HACNS1 +/+ embryos persisted for a longer period of time in hindlimb bud than in forelimb, consistent with the delayed developmental maturation of the hindlimb compared to forelimb (Martin, 1990). In addition, HACNS1 +/+ embryos showed a hindlimb-specific anteriorproximal expression domain adjacent to the body wall across all 6 timepoints (Fig. S3B).
Compared to the robust expression in the humanized limb, Gbx2 expression in both homozygous chimpanzee ortholog and wild type limb buds was weak and restricted to early time points (Fig. 3, S3A). Both homozygous chimpanzee ortholog embryos and wild type embryos showed a weak distal Gbx2 expression pattern in early forelimb and hindlimb that was primarily restricted to the anterior portion of the limb bud (Fig. 3, S3A). While weak distal expression was restricted to approximately T1-T2 and T1-T4 in wild type forelimb and hindlimb, respectively, it persisted until approximately T3 in forelimb and T5 in hindlimb in homozygous chimpanzee ortholog embryos (Fig. 3, S3A). This finding further supports that the chimpanzee ortholog may exhibit greater enhancer activity than the mouse ortholog, but substantially less activity than

HACNS1.
In addition to the forelimb and hindlimb bud, Gbx2 was also expressed in the neural tube, diencephalon, and pharyngeal arch of HACNS1 +/+ embryos, homozygous chimpanzee ortholog embryos, and wild type embryos (Fig. 3A, S3C). Whereas Gbx2 expression was primarily restricted to the first pharyngeal arch in chimpanzee ortholog line and wild type embryos, we observed a gain of a dorsal expansion of Gbx2 expression into the second pharyngeal arch in HACNS1 +/+ embryos during T1-T5 (Fig. S3C). However, we chose to focus on limb due to a lack of comparative epigenetic profiling data in human pharyngeal arch and the absence of a significant gain in H3K27ac marking in HACNS1 +/+ versus wild type pharyngeal arch.
In order to quantify the gain of Gbx2 expression in humanized limb bud, we used realtime quantitative reverse transcription PCR (qRT-PCR) in forelimb and hindlimb buds from HACNS1 +/+ embryos, homozygous chimpanzee ortholog embryos, and wild type embryos at each of the six developmental stages we defined above. We found that Gbx2 showed increased expression in HACNS1 +/+ forelimb and hindlimb versus the chimpanzee ortholog line and wild type at all 6 timepoints (Fig. 3C). Although we detected an increase in Gbx2 expression in the chimpanzee ortholog line versus wild type in early forelimb and hindlimb timepoints, this change was weaker than that between HACNS1 +/+ and wild type. Consistent with our ISH results, we found that Gbx2 expression in humanized forelimb and hindlimb was strongest at the earliest timepoints and that expression persisted longer in hindlimb than forelimb (Fig. 3C). In order to determine the significance of the effects of genotype and timepoint on Gbx2 expression, we used analysis of variance (ANOVA). We determined that genotype, timepoint, and genotypetimepoint interaction effects were significant, indicating that Gbx2 expression was significantly increased in humanized mouse. While Gbx2 expression declines over time in all three genotypes, it persists longer in humanized forelimb and hindlimb (Tables 1, 2).  T1  T2  T3  T4  T5  T6   T1  T2  T3  T4  T5  13 . CC-BY-NC 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted December 12, 2019. 14 . CC-BY-NC 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted December 12, 2019. ; 15 . CC-BY-NC 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted December 12, 2019. ; https://doi.org/10.1101/2019.12.11.873075 doi: bioRxiv preprint

HACNS1 drives increased Gbx2 expression in limb chondrogenic mesenchymal cells
Using bulk tissue analyses, we found that HACNS1 drives increased Gbx2 expression in distinct anterior and posterior domains in the humanized limb bud. In order to identify the precise cell types expressing Gbx2 as well as genes potentially coregulated with Gbx2, we performed single-cell RNA-sequencing (scRNA-seq) in E11.5 hindlimb bud from HACNS1 +/+ embryos, homozygous chimpanzee ortholog embryos, and wild type embryos. Using the 10x Genomics scRNA-seq platform for cell barcoding, library preparation, and sequencing, we obtained transcriptomes from approximately 10,000 cells per genotype. We used the Seurat toolkit for data preprocessing and library size normalization (see Materials and Methods;Stuart et al., 2019). During pre-processing, we removed endothelial and blood cells (Cd34-positive, Civin et al., 1984;Pf4-positive, Deutsch et al., 1955;Hbb-positive, Bunn and Forget, 1986), as preliminary analysis indicated that these transcriptionally and developmentally distinct cell types do not express Gbx2 in any of our datasets. After normalization of the filtered data using the SCTransform method in Seurat and integration of data from all samples into a single dataset using the Seurat v3 integration workflow (see Materials and Methods), we performed a clustering analysis on the integrated dataset to identify cell type categories present in all three genotypes (Stuart et al., 2019). In order to visualize similarities between cells, we used Uniform Manifold Approximation and Projection for Dimensional Reduction (UMAP), a dimensionality reduction method for data visualization (McInnes et al., 2018).
Though the UMAP embedding reveals a continuum of cell types, we used the Louvain method for community detection to obtain a finer granularity of subtyping (Blondel et al., 2008). This analysis revealed three distinct groups: clusters 1-4 comprised mesenchymal cell subtypes based on expression of the known marker genes Sox9 (clusters 1, 3, 4; Wright et al., 1995), Bmp4 (cluster 2; Gañan et al., 1996), Shox2 (cluster 1; Clement-Jones, 2000), and Hoxd13 (clusters 1-4; Favier, 1997) (Fig. 4A). The remaining two groups comprised non-mesenchymal cell types, including myogenic cells (cluster 5, Myod1; Sassoon et al., 1989) and ectodermal cells (cluster 6, Fgf8; Martin, 1998) (Fig. 4A). Furthermore, the UMAP embedding revealed finer separation of mesenchymal cells according to known limb geographical patterning markers. We first examined the expression of known proximal-distal limb bud markers Meis1, Hoxa11, and Hoxd13 (proximal, medial, and distal, respectively; Tabin and Wolpert, 2007). Cells expressing each of these markers showed a distinct localization in the UMAP embedding (Meis1+ cells in the top 16 . CC-BY-NC 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted December 12, 2019. ; https://doi.org/10.1101/2019.12.11.873075 doi: bioRxiv preprint left, Hoxa11+ cell in the center, and Hoxd13+ cells in the lower right; Fig. 4B, upper left) suggesting our analysis recovered transcriptional and cell-type transitions along the proximaldistal patterning axis (Fig. 4B).
We also found that the first axis of the UMAP embedding clearly recapitulates known gene expression gradients along the anterior-posterior limb bud axis based on expression of the anterior-proximal marker Irx3 (Li et al., 2014), the anterior marker Zic3 (Quinn et al., 2012), and the posterior-proximal marker Shh (Riddle et al., 1993) (Fig. 4B, upper right). Using markers of chondrogenic (Sox9, Shox2; Wright et al., 1995;Clement-Jones, 2000) versus non-chondrogenic (Bmp4; Gañan et al., 1996) mesenchyme, we found that the second UMAP axis follows the chondrogenic versus interdigital apoptotic fate gradient (Fig. 4B). We found that the expression patterns of the aforementioned markers were broadly conserved between genotypes, with each genotype showing comparable subsets of proximal, distal, anterior, posterior, chondrogenic, and non-chondrogenic cell types (Fig. S4). Collectively, these results establish that our scRNA-seq analysis identified specific conserved cell types and spatial transcriptional gradients in the developing hindlimb bud across all three genotypes.
We then sought to define genotype-specific differences in Gbx2 expression. In order to identify the cell types expressing Gbx2 in humanized limb bud, we examined the distribution of Gbx2-positive cells across cell clusters in hindlimb buds from all three genotypes. We found that Gbx2 was upregulated in HACNS1 +/+ hindlimb versus homozygous chimpanzee ortholog and wild type hindlimbs, primarily in the mesenchymal cell clusters (clusters 1-4), consistent with our ISH and qRT-PCR expression analyses ( Fig. 4C and Fig. 3). In HACNS1 +/+ hindlimb, 24% of cells expressed Gbx2, 96% of which were mesenchymal cells, whereas less than 1% of homozygous chimpanzee ortholog or wild type cells expressed Gbx2 (see Materials and Methods; Table S10). UMAP embedding of these cells reveals that Gbx2-positive cells in HACNS1 +/+ hindlimb bud largely cluster within a distinct subset of mesenchymal cells belonging primarily to Louvain clusters 1, 2 and 4 ( Fig. 4C, D; Table S10).

17
. CC-BY-NC 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted December 12, 2019.  18 . CC-BY-NC 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted December 12, 2019. ;  (Linderman et al., 2018) and centered and scaled using z-scores.

19
. CC-BY-NC 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted December 12, 2019. ; https://doi.org/10.1101/2019.12.11.873075 doi: bioRxiv preprint To find genes whose expression is associated with Gbx2, we used k-Nearest-Neighbors Conditional-Density Resampled Estimate of Mutual Information (kNN-DREMI), which computes scores quantifying the strength of the relationship between two genes (van Dijk et al., 2018;Krishnaswamy et al., 2014). Using kNN-DREMI scores, we ranked each gene expressed in HACNS1 +/+ limb by the strength of its association with Gbx2. To determine if genes associated with Gbx2 were enriched in particular functions, we performed Gene Set Enrichment Analysis (GSEA) on all expressed genes ranked by strength of association with Gbx2. We found that We also used an orthogonal approach that is naïve to our identification of Gbx2 as the target of HACNS1 in order to characterize cells with altered transcriptional profiles in humanized limb bud. We used Manifold Enhancement of Latent Dimensions (MELD), a method that uses graph signal processing to identify gene expression trends in the data corresponding to genotype (Burkhardt et al., 2019), in order to distinguish cells that are most prototypical of an experimental condition -in our case, HACNS1 +/+ hindlimb cells versus hindlimb cells from the chimpanzee ortholog line and wild type. The output of MELD is an "Enhanced Experimental Signal," which assigns a score for each cell that reflects the degree to which that cell is altered by the experimental condition -in this case, humanization (Burkhardt et al., 2019). In order to identify gene expression patterns characteristic of humanized hindlimb bud cells, we used kNN-DREMI to associate gene expression with EES scores generated by MELD. We then used the resulting gene rankings to identify enriched biological functions via GSEA, as previously described for Gbx2 expression. We found that genes associated with both EES scores and increased Gbx2 expression converged on related biological processes. Performing GSEA using genes ranked by mutual information with EES score revealed significant enrichment of the (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted December 12, 2019. ; https://doi.org/10.1101/2019.12.11.873075 doi: bioRxiv preprint P=1.11x10 -3 ), "Collagen Fibril Organization" (KS P=1.40x10 -5 ), and "Positive Regulation of Phosphatidylinositol 3-kinase Signaling" (KS P=4.40x10 -5 ). This last category, along with collagen expression, is implicated in chondrocyte differentiation (Kita et al., 2008;Lefebvre et al., 1997; Fig. 5A, Table S12) The convergence of both the Gbx2-centric and Gbx2-naïve approaches to identifying downstream effects of humanization on overlapping sets of genes implicated in chondrocyte differentiation (as shown in Fig. 5B) led us to examine the expression patterns of chondrocyte differentiation-related genes in humanized mesenchymal cells belonging to Louvain clusters 1-4 ( Fig. 5C). We clustered HACNS1 +/+ mesenchymal cells by EES score and Gbx2 expression and examined the expression of the "Chondrocyte Differentiation" ontology genes within Gbx2positive cells in order to visualize the expression of genes in the ontology compared to Gbx2 (Fig. 5C). This analysis identified Gbx2-positive humanized cells that showed higher expression of positive regulators of chondrocyte differentiation (e.g. Sox9, Col2a1,Bmp2,and Runx2) compared to other mesenchymal cells, indicating that HACNS1-driven upregulation of Gbx2 occurs in chondrogenic cells destined for digit formation ( Fig. 5C; Lefebvre et al., 1997;Shea et al., 2003;Takigawa et al., 2010;Yoshida et al., 2004). We also identified a subset of Gbx2positive humanized cells that showed expression of Bmp4, which is expressed in the apoptotic interdigital domains (Francis et al., 1994). These findings suggest that upregulation of Gbx2 impacts the interdependent pathways of both digit condensation and interdigital cell fate specification required for digit morphogenesis.

21
. CC-BY-NC 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted December 12, 2019. ; https://doi.org/10.1101/2019.12.11.873075 doi: bioRxiv preprint C Gbx2

22
. CC-BY-NC 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted December 12, 2019. ; https://doi.org/10.1101/2019.12.11.873075 doi: bioRxiv preprint

Fig. 5. Gbx2-positive mesenchymal cells in HACNS1 +/+ limb bud express markers of chondrocyte differentiation. (A)
Ontology enrichments of genes with expression associated with Gbx2 expression (top) and the humanized state (EES score, bottom) in HACNS1 +/+ mesenchyme. The log-transformed Gene Set Enrichment Analysis Kolmogorov-Smirnov P value for each category is plotted on the x-axis. Ontologies shown are those overlapping in the Gbx2 expression and EES score ontology enrichments lists. See also tables S11 and S12. (B) EES and Gbx2 kNN-DREMI scores plotted for all genes. Genes ranked in the top 20% of kNN-DREMI scores in the Chondrocyte Differentiation ontology (GO:0002062) for the union of the EES and Gbx2 kNN-DREMI analysis gene lists are colored in red and labeled. (C) Heatmap showing expression of genes belonging to the ontology "Chondrocyte Differentiation" (GO:0002062) in all HACNS1 +/+ mesenchymal cells (Louvain clusters 1-4). Hierarchical clustering was used to determine the order of cells (in columns) and genes (in rows). The bar at the top of the heatmap shows Gbx2positive and Gbx2-negative cells in red and gray, respectively. The gene expression values shown are imputed using ALRA (Linderman et al., 2018) and centered and scaled using zscores.

23
. CC-BY-NC 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted December 12, 2019. ; https://doi.org/10.1101/2019.12.11.873075 doi: bioRxiv preprint

Humanized mouse limbs show no overt morphological changes
In order to determine if Gbx2 upregulation and downstream transcriptional changes in HACNS1 +/+ limb buds affect digit formation or overall limb morphology, we performed morphometric analysis of skeletal preparations at E18.5 for HACNS1 +/+ embryos, homozygous chimpanzee ortholog embryos, and wild type embryos (Materials and Methods). We found no gross morphological differences among genotypes. The three major limb segments (autopod, zeugopod, and stylopod) were present in HACNS1 +/+ and homozygous chimpanzee ortholog skeletons (Fig. 6, S5). To assess fine morphological differences, we examined digit length (normalized to body size based on ossified humerus length), and intradigital (phalange to metacarpal or metatarsal length) and interdigital ratios. Again, we found no significant differences in digit length or proportions between HACNS1 +/+ , homozygous chimpanzee ortholog and wild type autopods (Fig. 6, S5, Tables S6-S8).

24
. CC-BY-NC 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made  Fig. 6. Limb skeletal morphology is overtly similar in HACNS1 +/+ , chimpanzee ortholog line, and wild type mice. Representative images of E18.5 forelimbs and hindlimbs of the indicated genotypes stained with Alizarin Red (violet; bone) and Alcian Blue (blue; cartilage). Forelimb and hindlimb autopod, zeugopod, and stylopod are shown on the left, and numbers indicate digit identity. High magnification images of forelimb and hindlimb autopods are shown on the right.

25
. CC-BY-NC 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted December 12, 2019. ; https://doi.org/10.1101/2019.12.11.873075 doi: bioRxiv preprint

DISCUSSION
Understanding how uniquely human genetic changes altered developmental processes is essential to understanding the evolution of our species. Here we investigated the role of the Human Accelerated Region HACNS1 in human limb evolution by directly interrogating its biological functions in a humanized mouse model. This in vivo approach enabled us to identify the spatial and temporal changes in developmental gene expression driven by HACNS1 and to characterize the specific cell types affected by these changes, providing insight into the developmental processes modified due to human-specific alterations in enhancer activity.
First, we determined that HACNS1 is active in the mouse genomic context, recapitulating its human-specific function in the trans-regulatory environment of the developing limb. Second, we found that Gbx2 showed increased promoter activity in the HACNS1 +/+ limb bud, strongly supporting that it is regulated by HACNS1, and then demonstrated that HACNS1 drives strong and highly localized changes in Gbx2 expression in the forelimb and hindlimb bud. These findings support the long-standing hypothesis that discrete regulatory changes altering expression of pleiotropic developmental regulators in specific tissues contribute to the evolution of morphological differences across species (Britten and Davidson, 1971;King and Wilson, 1975;Wray, 2007). Third, we identified the specific cell types that show upregulation of Gbx2 in the developing limb. Our single-cell transcriptome analyses captured the spectrum of embryonic limb cell types, enabling us to not only identify the specific subset of mesenchymal cells that express Gbx2, but also to place those cells in the developmental process of chondrogenesis. By both characterizing Gbx2-positive humanized cells and identifying humanized-prototypical cells without reference to Gbx2 expression, we implicated changes in Gbx2 regulation in chondrocyte differentiation, a process that serves as the basis for digit formation.
Our finding that HACNS1-driven Gbx2 upregulation is implicated in chondrocyte development indicates a possible role for HACNS1 in the modification of forelimb and hindlimb digit proportions on the human lineage. A primary distinguishing trait of the human forelimb is the greater relative length of the thumb versus the other digits (Aiello and Dean, 1990). In humans, this is due in part to the shortening of digits 2-5, which correlates with opposability and dexterity of the hand (Almécija et al., 2015). Differences between human and great ape feet include shorter digits with a non-opposable hallux, longer heel bones, and unique transverse and longitudinal arches defined by relative metatarsal-tarsal joint orientation in humans (Holowka 26 . CC-BY-NC 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted December 12, 2019. ; https://doi.org/10.1101/2019.12.11.873075 doi: bioRxiv preprint and Lieberman, 2018; Holowka et al., 2017). The changes in limb development that confer human-specific morphological outcomes therefore involve discrete modifications to the proportions and positioning of ancestral skeletal structures.
Our findings indicating a role for the HACNS1-Gbx2 pathway in modifying digit chondrogenesis during human limb evolution serve as the foundation for identifying the broader network of developmental changes that drove the emergence of human-specific limb traits.
Relative phalange shortening in the forelimb and hindlimb are implicated in human limb evolution, and the similarity in Gbx2 expression patterns in humanized forelimb and hindlimb bud suggests that Gbx2 upregulation could contribute to parallel morphological changes by influencing early chondrocyte differentiation and digit formation in both the upper and lower limbs (Aiello and Dean, 1990;Almécija et al., 2015). While the cellular mechanisms controlling relative segment growth within digits remain unclear, studies in vertebrates with differing limb morphologies have shown that patterns of developmental gene expression tend to diverge during the pre-and post-patterning phases of limb bud development (Saxena et al., 2017). The timing of HACNS1-driven upregulation of Gbx2 at the nexus of the pre-and post-patterning phases in the humanized mouse limb is intriguing, given the plasticity of cell fate specification during this time period (Hiscock et al., 2017;Saxena et al., 2017). A shift in the chondrogenic trajectory favoring chondrocyte differentiation over proliferation in the presumptive phalanges, as suggested in this study, could therefore contribute to intradigital ratio modifications that evolved on the human lineage.
We have shown that humanized mouse models provide a viable and fruitful path for studying gene regulatory mechanisms in human evolution within the complete genomic, tissuelevel, and developmental framework of a living mammalian system. The effect of genetic changes in any one enhancer is likely to be insufficient to replicate human-specific morphological changes entirely in an experimental model. However, changes in expression of single genes, particularly pleiotropic regulators such as Gbx2, can have large molecular effects on developmental pathways. The HACNS1 humanized mouse model provides an entry point for understanding the larger developmental networks that changed during human limb evolution, of which Gbx2 is a part. Identification of changes in chondrogenesis downstream of HACNS1, in conjunction with our understanding of regulatory activity in the developing human limb, can be used to identify related factors modifying human chondrogenesis. Using humanized mouse 27 . CC-BY-NC 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted December 12, 2019. ; https://doi.org/10.1101/2019.12.11.873075 doi: bioRxiv preprint models, additional molecular drivers contributing to human-specific phenotypes may be studied at once, either through intercrossing mouse lines with unlinked loci or by iterative editing of one locus, allowing us to expand our understanding of human limb evolution. Furthermore, humanized mouse models provide the means to interrogate the larger developmental networks involving additional HARs in order to understand their contributions to human-specific developmental features. Our study thus provides a novel framework for using humanized mouse models to link individual sequences that arose on the human lineage since species divergence to the unique traits that distinguish our species.

28
. CC-BY-NC 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

29
. CC-BY-NC 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted December 12, 2019. ;

Fig. S1. Development and validation of the HACNS1 and chimpanzee ortholog mouse models. (A)
Left: HACNS1 and chimpanzee ortholog line editing constructs are shown with the orthologous replaced region in mouse genome aligned to other vertebrate species derived from the 100-way Multiz alignment in the UCSC hg19 assembly. Nonpolymorphic, fixed human-specific substitutions are shown above the human construct and all human versus chimpanzee sequence differences are shown below. The transcription factor binding sites (TFBS) unique to HACNS1 versus the chimpanzee ortholog shown are based on predictions of JASPAR core mammal motifs in the human and chimpanzee genomes as described in Uebbing et al. 2019 (see also Table S13). Right: Embryonic stem cell editing construct showing antibiotic resistance (NeoR/KanR), diphtheria toxin (DTA), mouse homology, and location of human or chimpanzee sequences. (B) Left: PCR products generated with primers specific to the mouse, both HACNS1 and chimpanzee, and HACNS1 only orthologs were used for genotyping of HACNS1 +/+ (labeled as H), chimpanzee ortholog line (labeled as C), and wild type (labeled as W) mice from the F9 or later generation. Middle: PCR products were generated using primers outside the homology arms for Sanger sequencing of the edited locus. Right: PCR products were generated using primers anchored in the 5' homology arm and 14.8 kb upstream (5' adjacent sequence) and primers anchored in 3' homology arm and 15.1 kb downstream (3' adjacent sequence) for Sanger sequencing of the regions surrounding the editing locus. (C) Sanger sequencing contigs of cloned PCR products from (B), spanning the 40 kb region surrounding edited locus for HACNS1 +/+ (top) and the chimpanzee ortholog line (bottom). (D) HACNS1 +/+ , chimpanzee ortholog line, and wild type genomic DNA qPCR for primers specific to the edited region, 5' homology arm, 3' homology arm, and adjacent unedited region. All Ct values are normalized to a region on chromosome 5. Three biological replicate samples are shown per genotype. Error bars denote standard deviation between technical replicates.

30
. CC-BY-NC 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted December 12, 2019. ; https://doi.org/10.1101/2019.12.11.873075 doi: bioRxiv preprint

Fig. S2. H3K37ac and H3K4me2 ChIP-seq analyses in limb bud and pharyngeal arch.
Normalized H3K27ac (left) and H3K4me2 (right) epigenetic signals in the region spanning the full HACNS1-Gbx2 locus for two biological replicates per genotype for E11.5 limb bud (A, B) and pharyngeal arch (C, D). The location of the edited HACNS1 locus relative to nearby genes is shown above each track group with a green bar directly below the corresponding UCSC mm9 genome track. HACNS1 and Gbx2 loci are highlighted in gray. All significant peaks are represented by genotypespecific colored bars below the signal tracks for HACNS1 +/+ (in dark green), chimpanzee ortholog line (in olive), and litter-matched wild type (in teal). Peak calls showing significant signal increases between HACNS1 +/+ and litter-matched wild type, or chimpanzee ortholog line and litter-matched wild type, are shown as red squares below each track group. Detailed differential peak information is available in Table S4.

33
. CC-BY-NC 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted December 12, 2019. ;

34
. CC-BY-NC 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made  Fig. S4. Limb bud patterning marker expression in HACNS1 +/+ , chimpanzee ortholog line, and wild type E11.5 hindlimb bud. UMAP embedding of hindlimb bud cells from HACNS1 +/+ , chimpanzee ortholog line, and wild type mice, showing conserved expression of representative proximal-distal (top row), anterior-posterior (middle row), and chondrogenesisapoptosis marker genes (bottom row). All gene expression data are imputed using ALRA (Linderman et al., 2018) and centered and scaled using z-scores. See text and Fig. 4B for details on marker genes. (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted December 12, 2019. ;

Mouse Line Generation and Validation
The HACNS1 and chimpanzee ortholog lines were generated at the Yale Genome Editing Center using standard techniques in modified ES cells (Behringer et al., 2014). C57BL6/J-A w-J /J mouse ES cells (first described in Petersen et al., 2016) were edited by electroporation of a GFP cloning vector containing human (1,241 bp) or chimpanzee (1,240 bp) sequence flanked by C57BL/6J mouse sequence homology arms, floxed pPGKneo vector, and diphtheria toxin sequence (Fig. S1A). The genomic coordinates of the human (hg19) and mouse (mm9) sequences used in the editing constructs, including the mouse homology arm sequences, are listed in Table S1 (Rhead et al., 2009). G0 chimeras were backcrossed to wild type C57BL/6J and crossed with an actin-Cre C57BL/6J mouse line to remove vector sequence. All mice used in our analysis were from F9 or later generations. All animal work was performed in accordance with approved Yale IACUC protocols.
Genotyping primers specific to HACNS1, chimpanzee, and mouse orthologs are listed in Table S2. Cloning primers listed in Table S2 Table S2.
All biological replicates of each genotype were run in triplicate and Ct values of each were normalized to a control region on a different chromosome (see Table S2).

38
. CC-BY-NC 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted December 12, 2019. ;https://doi.org/10.1101https://doi.org/10. /2019 Chromatin Immunoprecipitation, ChIP-qPCR and ChIP-seq Tissue for chromatin preparation was collected from E11.5 forelimb and hindlimb bud pairs or pharyngeal arch tissue from HACNS1 and chimpanzee ortholog line heterozygous crosses to obtain pooled, litter matched limb bud or pharyngeal arch samples for all three genotypes (HACNS1 +/+ , chimpanzee ortholog line, and wild type). Two biological replicates were used per genotype per tissue, each with tissue pooled from three embryos. Pooled tissue was crosslinked and sonicated as previously described (Cotney et al., 2012). Chromatin for each genotype, tissue, and replicate was used for H3K27ac or H3K4me2 immunoprecipitation using Active Motif #39133 and Active Motif #39913 antibodies as previously described Cotney et al., 2012). ChIP-qPCR was performed using Power SYBR Green Mastermix (Thermo Fisher Scientific #4368577) with primers listed in Table S3. Samples were sequenced (2×100 bp) using standard Illumina protocols on an Illumina HiSeq 4000. To control for batch effects, all samples of the same tissue type were multiplexed and sequenced on a single lane.

39
. CC-BY-NC 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted December 12, 2019. ; https://doi.org/10.1101/2019.12.11.873075 doi: bioRxiv preprint

RNA Extraction and qRT-PCR
E11-E12 embryos were collected from six HACNS1 +/+ , chimpanzee ortholog line, or wild type litters generated by crossing homozygous animals for each line. All embryos within each genotype group were ordered based on stage and were divided into six timepoint groups per genotype consisting of forelimb or hindlimb buds from 4-6 pooled embryos per time point per genotype per tissue. RNA was purified using the Qiagen miRNeasy Kit (#74106). Invitrogen Superscript III Reverse Transcription Kit (#18080-051) was used to prepare cDNA from each sample. qPCR with the resulting cDNA was performed using Power SYBR Green Mastermix (Thermo Fisher Scientific #4368577). All samples were analyzed in triplicate using primers listed in Table S5 and Ct values of Gbx2 were normalized to Hprt1.

Whole mount In Situ Hybridization
E11-E12 mouse embryos were collected from HACNS1 +/+ (n=7 litters), chimpanzee ortholog line (n=8 litters), and wild type (n=12 litters) homozygous crosses. Embryos were fixed and hybridized with the same preparation of antisense Gbx2 mRNA probe under identical conditions as previously described (Louvi and Wassef, 2000;Louvi et al., 2007). The Gbx2 probe used for hybridization contains the full mouse consensus CDS sequence (CCDS15150.1; NCBI CCDS Release 23). The 178 embryos were divided into sextiles based on crown-rump length and assessed for staining pattern by three individuals blinded to genotype under a stereo microscope (Leica S6D). Representative images were taken using a Zeiss Stemi stereomicroscope.

Sample Preparation
Tissue for scRNA-seq was collected from two each of HACNS1 +/+ , chimpanzee ortholog line, and wild type litters from E11.5 homozygous crosses. Left hindlimb buds from three embryos per genotype per replicate were pooled. Following dissection, the tissue was immediately placed in CMFSG saline-glucose solution (1x Calcium-magnesium-free phosphate buffered saline from Thermo Fisher Scientific #21-040-CV with 0.1% glucose from Corning 40 . CC-BY-NC 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

Filtering and Preprocessing
Matrices from the 10x Cell Ranger platform were filtered and preprocessed using Seurat v3.0.1 (Stuart et al., 2019). Prior to the generation of Seurat objects, Xist gene counts were eliminated in order to avoid clustering by sex within mixed sample populations. Genes expressed in fewer than 5 cells per sample were removed. Cells with greater than 7.5% or 2% counts from mitochondrial genes or hemoglobin genes, respectively, were removed. Cells with total gene count (nGene) z-scores less than -1 or greater than 4 were removed, as well as those with total UMI count (nUMI) z-scores greater than 7. One chimpanzee ortholog line replicate was removed during pre-processing due to high overall mitochondrial gene expression, indicative of low viability. Prior to data integration, expression values from each sample were normalized based on library size for pre-processing purposes only using the Seurat tool NormalizeData (Stuart et al., 2019). Louvain clustering as implemented in Seurat was performed for pre-processing purposes only using FindVariableFeatures, ScaleData, RunPCA, FindNeighbors, and FindClusters in order to remove endothelial cell clusters (Cd34-positive and Pf4-positive), clusters characterized by aberrant mitochondrial gene expression (low mt-Co1), and transcriptionally distinct clusters containing fewer than 30 cells per sample (Blondel et al., 2008; 41 . CC-BY-NC 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted December 12, 2019. ; https://doi.org/10.1101/2019.12.11.873075 doi: bioRxiv preprint Stuart et al., 2019). The numbers of cells remaining after pre-processing for each sample are listed in Table S9.

Data Normalization and Integration
All subsequent normalization and integration steps after pre-processing were performed with raw counts for all cells retained after pre-processing (see Table S9). Cell cycle scores were computed using CellCycleScoring in Seurat to regress out the difference between G2M and S phases in order to preserve only differences between cycling and non-cycling cells and regress out differences related to cell cycle amongst proliferating cells (Stuart et al., 2019). In addition to cell cycle scores, percent mitochondrial gene expression and nUMI values were regressed using SCTransform (SCT) in order to remove the effects of sequencing depth and minor differences in mitochondrial DNA expression related to viability . All SCT normalized datasets containing all genes from individual samples were integrated using SelectIntegrationFeatures, PrepSCTIntegration, FindIntegrationAnchors, and IntegrateData Stuart et al., 2019).
Following integration, the combined dataset was randomly down-sampled to contain 10,000 cells per genotype prior to embedding and clustering using SubsetData in Seurat (Stuart et al., 2019). PCA and UMAP were implemented in Seurat using RunPCA, RunUMAP, FindNeighbors, and FindClusters (McInnes et al., 2018;Stuart et al., 2019). Normalized data from all samples combined was used for imputation using ALRA with default settings (Linderman et al., 2018). The threshold for identifying Gbx2-positive cells was set as an imputed Gbx2 expression value greater than 0.1.

42
. CC-BY-NC 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted December 12, 2019. ; magic.MAGIC(knn=G.knn, decay=G.decay), and magic_op.fit_transform(data, graph=G) (van Dijk et al., 2018). MELD was run on one-hot vectors for each genotype independently with the same graph signal as MAGIC using meld_op.fit_transform(RES.toarray(), G) (Burkhardt et al., 2019). kNN-DREMI was run on MAGIC-imputed data, and with MELD EES values corresponding to the humanized condition for humanized cells, with scprep using scprep.stats.knnDREMI (Krishnaswamy et al., 2014).

Gene Set Enrichment Analysis
GSEA was performed using topGO v.2.34.0 on all expressed genes that were ranked by Gbx2-DREMI or EES-DREMI score from the aforementioned humanized mesenchymal cell kNN-DREMI analysis (Alexa and Rahnenfuhrer, 2018). Significant nodes were identified using the Kolmogorov-Smirnov test and elim algorithm. Annotated genes were marked as significant if ranked in the top 20% of kNN-DREMI scores. Ontologies listed in Tables S11 and S12 are the top 30 nodes with fewer than 100 annotated genes (to remove non-specific categories) and at least one significant gene. Heatmap hierarchical clustering was performed using pheatmap v1.0.12 (Kolde, 2015).

Skeletal Staining
E18.5 skeletons from two litters from each of HACNS1 +/+ , chimpanzee ortholog line, and wild type homozygous crosses (n=48 embryos) were stained with Alcian Blue and Alizarin Red as previously described (Behringer et al., 2014). Skeletons were imaged under a stereo microscope (Leica S6D). Bone and cartilage lengths of the forelimb and hindlimb pelvic girdle, stylopod, zeugopod, and autopod were measured blinded to genotype using ImageJ 2.0.0. CC-BY-NC 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted December 12, 2019. ; https://doi.org/10.1101/2019.12.11.873075 doi: bioRxiv preprint the sum of all metacarpal/metatarsal and phalanx segments. Raw measurements and digit length were normalized to the length of ossified humerus or femur for forelimb or hindlimb digits, respectively. Phalange to metacarpal ratio was calculated as the ratio of the sum of the phalange lengths of each digit to the corresponding metacarpal/metatarsal segment. Interdigital ratios were calculated using raw digit lengths. Raw measurements are available at noonan.ycga.yale.edu/noonan_public/Dutrow_HACNS1/.

ANOVA Analysis for Gene Expression and Morphometry
ANOVA analysis was performed with the lme4 package in R using default parameters to dissect the effects of genotype on Gbx2 expression (qRT-PCR data) and limb segment length (morphometric data) (Bates et al., 2015). For the qRT-PCR dataset, we employed an additive model of ΔCt to calculate the main effects of genotype and timepoint and a genotype:timepoint interaction term (Ct Gbx2 -Ct Hprt1 ~ Genotype * Timepoint). For the morphometric datasets, we calculated the effects of genotype, litter, sex, forelimb versus hindlimb, digit number, and right versus left (RL) on normalized digit length, phalange to metacarpal ratio and interdigital ratio (Length Ratio ~ Genotype * (1|Genotype/Litter) * Sex * Limb * Digit * (1|RL) * (1|Litter/Embryo) * (1|Sex/Embryo) * (1|Genotype/Embryo)). For both datasets, multiple comparisons adjustment was performed using the Benjamini & Hochberg method (Benjamini and Hochberg, 1995).

44
. CC-BY-NC 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted December 12, 2019. ; https://doi.org/10.1101/2019.12.11.873075 doi: bioRxiv preprint

Data Availability
The Gene Expression Omnibus accession number for the data reported in this paper is GSE141471.
. CC-BY-NC 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted December 12, 2019. ; https://doi.org/10.1101/2019.12.11.873075 doi: bioRxiv preprint . CC-BY-NC 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted December 12, 2019. ; https://doi.org/10.1101/2019.12.11.873075 doi: bioRxiv preprint