Modeling uniquely human gene regulatory function via targeted humanization of the mouse genome

The evolution of uniquely human traits likely entailed changes in developmental gene regulation. Human Accelerated Regions (HARs), which include transcriptional enhancers harboring a significant excess of human-specific sequence changes, are leading candidates for driving gene regulatory modifications in human development. However, insight into whether HARs alter the level, distribution, and timing of endogenous gene expression remains limited. We examined the role of the HAR HACNS1 (HAR2) in human evolution by interrogating its molecular functions in a genetically humanized mouse model. We find that HACNS1 maintains its human-specific enhancer activity in the mouse embryo and modifies expression of Gbx2, which encodes a transcription factor, during limb development. Using single-cell RNA-sequencing, we demonstrate that Gbx2 is upregulated in the limb chondrogenic mesenchyme of HACNS1 homozygous embryos, supporting that HACNS1 alters gene expression in cell types involved in skeletal patterning. Our findings illustrate that humanized mouse models provide mechanistic insight into how HARs modified gene expression in human evolution.


Reporting for specific materials, systems and methods
We require information from authors about some types of materials, experimental systems and methods used in many studies. Here, indicate whether each material, system or method listed is relevant to your study. If you are not sure if a list item applies to your research, read the appropriate section before selecting a response. Methods n/a Involved in the study ChIP-seq Flow cytometry MRI-based neuroimaging Antibodies Antibodies used Validation data including ChIP differential peak analysis data (Fig. 2, S2) and Sanger sequencing data (Fig. S1) can be found at http://noonan.ycga.yale.edu/. The Vista Enhancer Browser is publicly available at http://enhancer.lbl.gov/. Access to all additional data needed to evaluate the conclusions in the paper is provided in the manuscript and the Supplementary Materials.
No statistical methods were used to predetermine sample size for ChIP-seq, scRNA-seq, or RT-qPCR analyses. In order to minimize noise and provide consistent results with fewer samples, all biological replicates for scRNA-seq and ChIP-seq experiments were derived from pooled tissue from three embryos each (requiring litter-matching with wild type embryos for ChIP-seq). Similarly, the RT-qPCR experiment involved pooling tissue (4-6 embryos per genotype per tissue per time point) in order to minimize noise and provide more consistent results with fewer samples. This strategy enabled the simultaneous processing of over 70 embryos from 6 litters. Morphometric studies and ISH analyses were done using large sample sizes: limb samples from 48 embryos for morphometry and over 100 embryos obtained from multiple litters for each genotype for ISH analyses. ChIP-seq findings were supported by orthogonal methods as described below.
One scRNA-seq replicate from the chimpanzee ortholog line was excluded based on high overall mitochondrial gene expression indicative of low viability based on pre-established filtering metrics. For ISH and morphometric analyses, no data were excluded from the analyses; missing data values indicate samples that could not be evaluated/measured due to damage to tissue.
In order to ensure reproducibility of the experimental findings, all experiments were performed in parallel and with identical treatment of biological samples. All ChIP-seq findings were validated using qPCR of both the sequenced samples as well as additional biological replicates. RT-qPCR results shown in Fig. S3 were validated with additional biological and technical replicates. All samples prepared for ChIP-seq, RT-qPCR, ISH, and scRNA-seq data as shown in the final figures were treated identically and in parallel. All attempts at replication were successful.
The biological replicates for the ChIP and scRNA experiments all required pooling of tissue from multiple embryos. In order to assign tissues to biological replicates, all samples from individual embryos were randomly assigned identification numbers that allowed for random allocation into pooling groups.
Qualitative analysis of ISH results were performed using a blinded approach by randomizing embryo identification numbers prior to annotation. Morphometric data was collected blinded to genotype using randomized identification numbers. ChIP-seq, RT-qPCR, and scRNAseq were performed without group allocation blinding as all biological and technical replicates were processed identically and in parallel and no qualitative analyses were required for these experiments.
Specificity of H3K27ac and H3K4me2 antibodies was validated by the authors using dot blot analysis. Additional validation measures including dot blot analysis and ChIP-qPCR were performed by Active Motif (https://www.activemotif.com/documents/tds/39133.pdf ; https://www.activemotif.com/documents/tds/39913.pdf) nature research | reporting summary Note that full information on the approval of the study protocol must also be provided in the manuscript.

ChIP-seq Data deposition
Confirm that both raw and final processed data have been deposited in a public database such as GEO.
Confirm that you have deposited or provided access to graph files (e.g. BED files) for the called peaks.

Data access links
May remain private before publication.

Files in database submission
Genome browser session Positive clones were karyotyped and only clones of verified karyotype were microinjected. Cells produced agouti coat color in the resulting founders as expected.
Cells were confirmed free of mycoplasma contamination. NA All animal work was performed in accordance with approved Yale IACUC protocols (#2019-11167 and #2020-07271). . Mice were maintained in a Yale Animal Resources Center (YARC) managed facility under a standard 12h light/dark cycle and environmental monitoring according to YARC policies and procedures. C57BL/6J mice were obtained from Jackson Laboratory (Stock No. 000664) for generation of edited lines and subsequent backcrossing. Pooled tissue from both male and female embryos was used in experiments. Males and females for timed matings and line propagation ranged in age from 2 months to 2 years.
No wild animals were used in this study.
No field collected samples were used in this study.
All animal work was performed in accordance with approved Yale IACUC protocols. GEO accession number: GSE141471; SRA accession number: SRP234725 (BioProject PRJNA593575). We released these data with our preprint so they are public.
GEO submission contains bigwig files and peak files; SRA contains raw ChIP-seq and scRNA-seq data; ChIP differential peak analysis data can be found at http://noonan.ycga.yale.edu/noonan_public/Dutrow_HACNS1/ChIP_Differential_Analysis/ N/A Two biological replicates for each tissue and genotype for the humanized and chimpanzee ortholog line samples were used for sequencing. Four biological replicates were used from wild type in order to match litters for each humanized to wild type and chimpanzee ortholog line to wild type comparison. Each biological replicate contains tissue pooled from 3 embryos each. Differential peak analysis was performed using both biological replicates of each sample as implemented using getDifferentialPeaksReplicates.pl (HOMER v4.9.1). Results were validated using ChIP-qPCR for all biological replicates with three technical replicates.
Paired end reads (2x100bp) were generated for each sample. Samples were split across lanes by antibody only (H3K27ac, H3K4me2, and input) to avoid batch effects. We aimed for 40M read pairs per sample. Processed read statistics are available in individual peak file headers for each sample available at GEO accession GSE141471. Raw read counts are available at the linked SRA accession SRP234725.
Reads were aligned using bowtie2 (v2.2.8) with --sensitive and --no-unal and index files from mm9 or a custom mm9 index that includes the edited HACNS1 locus sequence (human or chimpanzee ortholog). Peaks were called using HOMER v.4.9.1 with findPeaks and the parameter -style histone. Replicating differential peaks were identified using HOMER v.4.9.1 getDifferentialPeaksReplicates.pl with parameters -DESeq2 -style histone. fastQC v0.11.5 was used to assess sequence quality. The average fraction of reads in peaks for H3K27ac and H3K4me2 samples were 0.35 and 0.61, respectively. Additional data quality metrics are available in individual peak file headers for each sample available at GEO accession GSE141471.