Differential DNA methylation of vocal and facial anatomy genes in modern humans

Changes in potential regulatory elements are thought to be key drivers of phenotypic divergence. However, identifying changes to regulatory elements that underlie human-specific traits has proven very challenging. Here, we use 63 reconstructed and experimentally measured DNA methylation maps of ancient and present-day humans, as well as of six chimpanzees, to detect differentially methylated regions that likely emerged in modern humans after the split from Neanderthals and Denisovans. We show that genes associated with face and vocal tract anatomy went through particularly extensive methylation changes. Specifically, we identify widespread hypermethylation in a network of face- and voice-associated genes (SOX9, ACAN, COL2A1, NFIX and XYLT1). We propose that these repression patterns appeared after the split from Neanderthals and Denisovans, and that they might have played a key role in shaping the modern human face and vocal tract.


Skeletal Methylation Maps
Previously, our ability to identify differentially methylated regions (DMRs) that discriminate between human groups was confined by three main factors: (i) We had a single DNA methylation map from a present-day human bone, which was produced using a reduced representation bisulfite sequencing (RRBS) protocol, which provides information for only ~10% of CpG positions in the genome. Moreover, the fact that the archaic and present-day methylomes were produced using different technologies -computational reconstruction versus RRBSpotentially introduces a bias. (ii) The analyses included only one bone methylation map from each of the human groups, which limited our ability to identify fixed differences between the groups. Although dozens of maps from additional tissues in present-day humans were included in the analyses, this narrowed the DMRs to represent only human-specific changes that are invariable between tissues. (iii) The work did not include a great ape outgroup. Thus, when a AMH-specific change was identified, it was impossible to determine whether it happened on the AMH lineage, or in the ancestor of Neanderthals and Denisovans 1 .
To overcome these obstacles, a major goal of the current study was to significantly extend the span of our skeletal methylome collection, covering as many individuals, sexes, and bone types as we could. This included the generation of many samples, including the high-coverage sequencing of additional ancient genomes, as listed below.

Present-day human bone DNA methylation maps
We generated full DNA methylation maps from two femur head bones from present-day humans using whole-genome bisulfite sequencing (WGBS). Femora were chosen because of their abundance in present-day human samples, as well as in ancient DNA samples [2][3][4] . In addition, we collected 53 publicly available partial skeletal methylation maps. Trabecular bone tissue from femur heads were taken from two patients with osteoarthritis during a total hip replacement surgery, and after filling in a consent form as per Helsinki approval #0178-13-HMO. Importantly, the effects of osteoarthritis processes on trabecular bone are much less substantial than those on the synovium, cartilage, and subchondral bone. Bone1 was a left head of femur taken on August 11, 2014 from a 66 years old female and Bone 2 was a right head of femur taken on September 2, 2014 from a 63 years old female.
DNA was extracted from bones using QIAamp® DNA Investigator kit (56504, Qiagen). Bones were cut to thin slices (0.2-0.5 mm) and then thoroughly washed (X5) with PBS, to clean samples from blood. Bones were crushed with mortar and pestle in liquid nitrogen, and 100 mg bone powder was taken to extract DNA according to the protocol Isolation of Total DNA from Bones and Teeth of the DNA Investigator kit.
Whole-genome bisulfite sequencing was conducted at the Centre Nacional d'analysis Genomico (CNAG) as described in 5 . After cell sorting, genomic DNA libraries were constructed using the Illumina TruSeq Sample Preparation kit (Illumina) following the manufacturer's standard protocol. DNA was then exposed to two rounds of sodium bisulfite treatment using the EpiTect Bisulfite kit (QIAGEN), and paired-end DNA sequencing was performed using the Illumina Hi-Seq 2000. We used the GEM mapper 6 with two modified versions each of the human (GRCh37) and viral reference genomes: one with all C's changed to T's and another with all G's changed to A's. Reads were fully converted in silico prior to mapping to the modified reference genomes, and the original reads were restored after mapping. Although methylation state should not depend on read position, positional biases have been previously reported 7 . We observed that the first few bases from each read showed a slightly higher probability of being called as methylated, and we thus trimmed the first ten bases from each read (M-bias filtering).. Heterozygous positions, positions with a genotype error probability greater than 0.01, and positions with a read depth greater than 250 were filtered out. Only cytosines with six or more reads informative for methylation status were considered. On average, half of the reads from either strand will be informative for methylation status at a given position, so minimum coverage is typically greater than 12. Methylated and unmethylated cytosine conversion rates were determined from spiked-in bacteriophage DNA (fully methylated phage T7 and unmethylated phage lambda). Five samples were excluded based on conversion rates <0.997, supported by visual inspection of CG and non-CG methylation plots. The over-conversion rates for all samples based on methylated phage T7 DNA were ~5%.
Sequence quality was evaluated using FastQC software v0.11.2. TRIMMOMATIC v.0-32 was used to filter low quality bases with the following parameters: -phred33 LEADING:30 TRAILING:30 MAXINFO:70:0.9 MINLEN:70. Paired-end sequencing reads were mapped to bisulfite converted human (hg19) reference genome using Bismark v0.14.3 and bowtie2 v2.2.4 not allowing multiple alignments and using the following parameters: --bowtie2 --non_bs_mm --old_flag -p 4. Potential PCR duplicates were removed using Bismark's deduplicate_bismark_alignment_output.pl Perl program. Bismark's bismark_methylation_extractor script was used to produce methylation calls with the following parameters: -p --no_overlap --comprehensive --merge_non_CpG --no_header --bedGraph -multicore 2 --cytosine_report. Examination of the M-bias plots led us to ignore the first 5 bp of both reads in human samples (Supplementary Figure 1). Custom scripts were used to summarize methylation levels at CpG sites based on the frequencies of methylated and unmethylated ancient DNA laboratory at the Faculty of Dental Medicine. The skull was cleaned with an excess of 10% bleach (equal to 0.6% Sodium hypochlorite) and then subjected to UV radiation for 30 minutes. The cortical layer on the temporal surface (facies temporalis) of the zygomatic bone (ZB) was removed by low-speed drilling using a Wolf Multitool Combitool Rotary Multi Purpose Tool equipped with a sterile dental burr. Another sterile burr was used to obtain powder of the subcortical trabecular bone within the body of the zygoma. The powder was collected onto a 10 x 10 cm aluminum foil sheet pretreated with a 10% bleach solution and then transferred into a sterile 1.5 ml Eppendorf tube for subsequent DNA extraction. Altogether, three samples were

Chimpanzee bone DNA methylation maps
Overall, we produced six methylation maps from bones of six common chimpanzee (Pan troglodytes) individuals. They include one WGBS of a wild chimpanzee, one RRBS of an infant chimpanzee, and four 850K methylation arrays of captive chimpanzees.
Chimpanzee tissue samples included in this study were opportunistically collected at routine necropsy of these animals. No animals were sacrificed for this study, and no living animals were used in this study.

WGBS of a chimpanzee bone
We used a rib bone of a 47-year-old female Chimpanzee provided from the Biobank of the Biomedical Primate Research Centre (BPRC), The Netherlands. The postmortem interval was approximately 10-12 hours. The bone was collected during the necropsy procedure and immediately frozen and stored at -80 ⁰C.
DNA was extracted in a dedicated ancient DNA laboratory at the Institute of Evolutionary Biology in Barcelona, where no previous work on great apes has ever been conducted. Standard precautions to avoid and monitor exogenous contamination such as frequent cleaning of bench surfaces with bleach, use of sterile coveralls, UV irradiation and blank controls were taken during the process. 200 mg of bone powder were obtained by drilling and the sample was Analysis was performed similarly to Bone1 and Bone2, with the exception that the BSreads were mapped to bisulfite converted chimpanzee (panTro4) reference genome, and we ignored the first 5bp of read1 and the first 44 bp of read2 in the chimpanzee sample (Supplementary Figure 1).
Methylation data were deposited in NCBI's Gene Expression Omnibus and are accessible through GEO accession number GSE96833.

RRBS of a chimpanzee bone
We used two unidentified long bone fragments that belonged to a newborn wild chimpanzee infant who died during a documented infanticide event at Gombe National Park on 9 March 2012. The infant was known to be the offspring of a chimpanzee called Eliza and was partially eaten by an adult female and her family. The sample was collected from the ground about 48 hours after the infant's death and stored in RNAlater solution until arrival at Arizona State University (ASU). At ASU the sample was stored at 4°C until extraction.
Sampling and DNA extractions were conducted at the ASU Ancient DNA Laboratory, a Class 10,000 clean-room facility in a separate building from the Molecular Anthropology Laboratory.
Precautions taken to avoid contamination included bleach decontamination and UV irradiation of tools and work area before and between uses, and use of full body coverings for all researchers.
The bone samples were pulverized together in December 2012 using a SPEX CertiPrep Freezer Mill. Three DNA extractions were conducted using 50-100 mg of bone powder (Supplementary Table 2) and following the extraction protocol by Rohland and Hofreiter 9 . Two extraction blank controls were included to monitor contamination of the extraction process. One µL each of the sample extract and the blank control were used for fluorometric quantification with the Qubit 2.0 Broad Range assay. All extracts were combined for a total volume of 345 µL and approximately 0.652 µg of total DNA. RRBS libraries were generated according to Boyle et al. 10 . 100-200 ng genomic DNA was digested with MspI. Subsequently, the digested DNA fragments were end-repaired and adenylated in the same reaction. After ligation with methylated adapters, samples with different adapters were pooled together and were subjected to bisulfite conversion using the EpiTect Bisulfite kit (QIAGen) per the manufacturer's recommendations with the following modification: after first bisulfite conversion, the converted DNA was treated with sodium bisulfite again to guarantee that conversion rates were no less than 99%. Two third of bisulfite converted DNA was PCR amplified and final RRBS libraries were sequenced in an Illumina HiSeq 2000 sequencer (Supplementary Data 1). Methylation data were deposited in NCBI's Gene Expression Omnibus and are accessible through GEO accession number GSE96833.

850K DNA methylation arrays
Four chimpanzee cadavers from captive colonies at the Southwest National Primate Research Center in Texas were used. Femora were opportunistically collected at routine necropsy of these animals and stored in -20°C freezers at the Texas Biomedical Research Institute after dissection.
These preparation and storage conditions ensured the preservation of skeletal DNA methylation patterns.
Samples were then transported to ASU and DNA was extracted from the femoral trabecular bone using a phenol-chloroform protocol optimized for skeletal tissues 11 . From the distal femoral condyles, trabecular bone was collected using coring devices and pulverized into bone dust using a SPEX SamplePrep Freezer/Mill. Specifically, bone cores were obtained from a transverse plane through the center of the medial condyle on the right distal femur, such that the articular surface remained preserved. Cortical bone was removed from these cores using a Dremel (Supplementary Table 3 Raw fluorescent data were normalized to account for the noise inherent within and between the arrays themselves. Specifically, we performed a normal-exponential out-of-band (Noob) background correction method with dye-bias normalization to adjust for background fluorescence and dye-based biases and followed this with a between-array normalization method (functional normalization) which removes unwanted variation by regressing out variability explained by the control probes present on the array as implemented in the minfi package in R which is part of the Bioconductor project. This method has been found to outperform other existing approaches for studies that compare conditions with known large-scale differences 12 , such as those assessed in this study.
After normalization, methylation values (β values) for each site were calculated as the ratio of methylated probe signal intensity to the sum of both methylated and unmethylated probe signal intensities. These β values range from 0 to 1 and represent the average methylation levels at each site across the entire population of cells from which DNA was extracted (0 = completely unmethylated sites, 1 = fully methylated sites).
Every β value in the Infinium platform is accompanied by a detection P-value, and those with failed detection levels (P-value > 0.05) in greater than 10% of samples were removed from downstream analyses.
The probes on the arrays were designed to specifically hybridize with human DNA, so our use of chimpanzee DNA required that probes non-specific to the chimpanzee genome, which could produce biased methylation measurements, be computationally filtered out and excluded from downstream analyses. This was accomplished using methods modified from Hernando-Herraez et al. 13 . Briefly, we used blastn to map the 866,837 50bp probes onto the chimpanzee genome (Assembly: Pan_tro_3.0, Accession: GCF_000001515.7) using an e-value threshold of e -10 . We only retained probes that successfully mapped to the genome, had only 1 unique BLAST hit, targeted CpG sites, had 0 mismatches in 5bp closest to and including the CpG site, and had 0-2 mismatches in 45bp not including the CpG site. This filtering retained 622,819 probes.
Additionally, β values associated with cross-reactive probes, probes containing SNPs at the CpG site (either human or chimpanzee), probes detecting SNP information, probes detecting methylation at non-CpG sites, and probes targeting sites within the sex chromosomes were removed using the minfi package in R. This filtering retained a final set of 576,505 probes.

Bisulfite-PCR of chimpanzee cranial bones
Postmortem frontal skull bones from two different chimpanzees (chimpanzee 1 and chimpanzee 2) were provided by the Biomedical Primate Research Centre (BPRC, The Netherlands). Bones were opportunistically collected during routine necropsy of these animals and stored at -80°C. CloneJET PCR Cloning Kit (Thermo Scientific, K1231) was used to clone the purified PCR products into a pJET1.2/blunt Cloning Vector following the Blunt-End Cloning Protocol described in the manufacturer's instructions. 3µl (chimpanzee 1 and 2) and 3µl (chimpanzee 3 and 4) of each cloning reaction product were used for transformation of DH5α Competent Cells

Reconstructing ancient DNA methylation maps
In a dedicated clean room at Harvard Medical School, powder was extracted from the root of a lower third molar of the Mesolithic La Braña 1 individual (5983-5747 calBCE (6980±50 BP, Beta-226472)), from which a non-UDG-treated library was previously sequenced to 3.5x coverage 15 . Two UDG-treated libraries from the same individual were later generated and enriched for approximately 1.2 million single targeted polymorphisms and sequenced to an average of 19.5x coverage at these position 16 . In this study, we carried out shotgun sequencing of one of the same UDG-treated libraries from this individual on a NextSeq500 instrument using 2 x 76bp paired end sequences. Following the mapping protocol described previously 16 , we trimmed adapter sequences, only processed read pairs whose ends overlapped by at least 15 bp (allowing for one mismatch) so that we could confidently merge them, and then mapped to the human reference sequence hg19 using the command samse in BWA (v0.6.1). We removed duplicated sequences by identifying sequences with the same start and stop position and orientation in the alignment, and picking the highest quality one. After restricting to sequences with a map quality of MAPQ ≥ 10, and sites with a minimum sequencing quality (≥20), we had an average coverage measured at the same set of approximately 1. Neolithic individual came from a community that practiced farming, and was anthropologically determined to be a male aged 6-10 years at the time of death (the sex was confirmed genetically).
The direct radiocarbon date was 6426-6236 calBCE (7460±50 BP, Poz-82231). In a dedicated clean room at Harvard Medical School, a UDG-treated library was prepared from this powder, which was previously enriched for about 1.2 million SNP targets, sequenced to 13.5x average coverage, and published in 16 . We shotgun sequenced the same library on nine lanes of a HiSeqX10 sequencing with 100bp paired reads. On data processing, we merged overlapping read pairs, trimmed Illumina sequencing adapters, and dropped read pairs that did not have sample barcodes (up to 1 mismatch) or cannot be unambiguously merged. We then aligned merged reads with BWA against human reference genome GRCh37 (hg19) plus decoy sequences, and combined all nine lanes of data and removed duplicate molecules, achieving an average of 24.3x coverage evaluated on the 1.2 million targets. This data is available under GEO accession number: GSE96833, with raw reads deposited under SRA accession number: SRX3194436. date; however, these were sequenced to a relatively low coverage (<5x), and thus, only crude methylation maps could be reconstructed from them. CT ratio was computed for every CpG position along the hg19 (GRCh37) human genome assembly, for each of the samples 1 .
In order to exclude from the analyses positions that potentially represent pre-mortem CT mutations rather than post-mortem deamination, the following filters were applied: (i) Positions where the sum of A and G reads was greater than the sum of C and T reads were excluded. (ii) For genomes that were produced using single-stranded libraries (i.e., Ust'-Ishim, Altai Neanderthal, Denisovan, Vindija Neanderthal and ~1/3 of the Loschbour library), positions where the GA ratio on the opposite strand was greater than 1/(average single strand coverage) were excluded. This fraction represents a threshold of one sequencing error allowed per position.
For Loschbour, this was performed only on the fraction of reads that came from the single stranded library. (iii) For all genomes, positions with a CT ratio > 0.25 were discarded. For the Vindija Neanderthal, this threshold was raised to 0.5, due to its relatively low coverage (~7x).
(iv) Finally, a maximum coverage threshold of 100 reads was used to filter out regions that are suspected to be PCR duplicates.
In all genomes, excluding Vindija, a fixed sliding window of 25 CpGs was used for smoothing of the CT ratio. This allowed for an unbiased scanning of differentially methylated regions (DMRs) that is not affected by the size of the window. Due to its relatively low coverage, we extended the sliding window used on the Vindija genome to 50 CpGs. This extended window is not expected to introduce a bias, as this genome was not used for DMR detection, but only for subsequent filtering that was applied equally to all genomes (see later).
CT ratio was translated to methylation percentage using linear transformation determined from two points: zero CT ratio was set to the value 0% methylation, and mean CT ratio in completely methylated (100% methylation) CpG positions in modern human bone reference (hereinafter μ100) was set to the value 100% methylation. Positions where CT ratio > μ100 were set to 100% methylation. For genomes that were extracted from bones, the modern Bone 2 WGBS map, which is the one with the higher coverage between the two WGBS modern bone maps, was used to determine μ100. For genomes that were extracted from teeth, there was no available modern reference methylation map, and therefore, we transformed the CT ratio into methylation percentage based on the assumption that the genome-wide mean methylation is similar to bone tissue. Thus, the genome-wide mean CT ratio represents 75% methylation, which is the genome-wide mean of measured methylation in the Bone 2 reference map. This was accomplished by setting μ100 to 1.33 x mean genome-wide CT ratio.

DMR detection
The DMR detection algorithm is comprised of five main steps. We hereby provide an overview of the algorithm followed by a detailed description of each step. The overall goal of this pipeline is to detect differential methylation, assign it to the lineage on which it arose and filter out within-lineage variation.
Step 1: Two-way comparisons. To avoid artifacts that could potentially be introduced by comparing DNA methylation maps that were produced using different technologies, our core analysis relied on the comparison of the three reconstructed maps of the Altai Neanderthal, Denisovan, and Ust'-Ishim. Each of the samples was compared to the other two in a pair-wise manner, as a raw CT ratio map against a reconstructed methylation map, and vice versa. This reciprocal comparison insured that the reconstruction process does not introduce biases to one of the groups. The minimum methylation difference threshold was set to 50%, spanning >50 CpGs.
Step 2: Three-way comparisons. This step classifies to which of the three hominins the DMR should be attributed. This step is done by overlapping the three lists of DMRs found in Step 1.
For example, a DMR that is detected between the Neanderthal and Ust'-Ishim and also between the Denisovan and Ust'-Ishim is considered specific to Ust'-Ishim.
Step 3: FDR filtering. Various factors could introduce noise to the reconstruction process, including the stochasticity of the deamination process, the use of a sliding window, and variations in read depth within a sample. We ran simulations that mimic the post-mortem degradation processes of ancient DNA, then reconstructed methylation maps from the simulated deamination maps and finally compared them to the original map and identified DMRs. Any differences in methylation levels between the simulated map and the original reference map stem from noise. Thus, running the same DMR-detection algorithm on the simulated map vs. the reference map, enables an estimation of the false discovery rate. We set the DMR-detection thresholds so that FDR < 0.05.
Step 4: Lineage assignment. The chimpanzee methylation maps were used to polarize the DMRs.
For each DMR, methylation levels in the chimpanzee were compared to those of the three hominin groups. For example, if methylation levels in the chimpanzee samples clustered with the archaic humans, the DMR was assigned to the AMH lineage.
Step 5: Within-lineage variability filtering. To determine whether a DMR represents an individual within a group, or is shared by the entire group, we used a total of 67 AMH, archaic and chimpanzee methylation maps. We used a conservative approach where DMRs in which methylation levels in one group overlap (even partially) the methylation levels in another group were discarded. As 59 out of the 67 maps belong to AMHs, our ability to filter out variation within this group was better, resulting in fewer DMRs along this lineage. Several various measures were used to ascertain that a DMR along a lineage does not represent a sex-, bone-, age-, technology or disease-specific DMR.

DMR-detection algorithm
We developed an algorithm specifically designed to identify DMRs between a deamination map and a full methylome reference. Let enumerate the CpG positions in the genome. In the deamination map, let be the number of T's at the C position + the number of A's in the opposite strand at the G position, i.e., it counts the total number of T's that appear in a position that is originally C, in the context of a CpG dinucleotide. We similarly use to count the total number of C's that appear in a position that is originally C, in the context of a CpG dinucleotide.
The CT ratio is defined as / , where = + . Let and (both between zero and one) be the methylation of this position in the reference genome and in the reconstructed one, respectively. If we denote by the deamination rate, assumed to be constant throughout the genome, and if we assume that deamination of C into T is a binomial process with probability of success , we get Our null hypothesis is that the th CpG is not part of a DMR, namely that = . The alternative hypothesis states that this CpG is part of a DMR. The definition of this statement is where Δ is some pre-specified threshold. In other words, under the alternative hypothesis we get that ≥ + Δ if the site has low methylation in the reference genome, and ≤ − Δ if it has high methylation in the reference genome.

Per-site statistic
Let us start with the first option, testing whether ≥ + Δ when is low. A log-likelihoodratio statistic would be Similarly, we can test whether ≤ − Δ when is high using the log-likelihood-ratio We used the value Δ = 0.5 for all samples. The value of , the deamination rate, was estimated using the overall CT ratio in CpG positions whose methylation level is 1 in the modern human Bone 2 WGBS methylation map, after exclusion of putative pre-mortem substitutions, as described in the reconstruction procedure section (Supplementary Data 1).

Detecting DMRs
The statistics ℓ + and ℓ − quantify how strongly the estimated methylation in position deviates from . Next, we use these values to identify DMRs using the cumulative-sum procedure explained below. The process is repeated twice: on the statistic ℓ + to identify DMRs where the sample has elevated methylation with respect to the reference, and on the statistic ℓ − to identify DMRs where the sample has reduced methylation with respect to the reference.
For convenience, we explain the cumulative-sum procedure in the context of ℓ + , but an essentially identical procedure is used for ℓ − . We define a new vector + by the recursion Under the null hypothesis, ℓ + has a negative expectation which produces a negative drift that keeps + at zero, or close to zero, levels. Under the alternative hypothesis the expectation is positive, hence the drift over a DMR is positive, leading to an elevation in the values of + .
Therefore, our next step is to find all intervals

Filtering DMRs
Of course, + may increase locally due to randomness, and thus a putative DMR may not reflect a true DMR. To filter out such intervals, we used two strategies. First, we applied a set of filters to assure that the putative DMRs have reasonable biological properties. Second, we cleaned the remaining putative DMRs by applying a false discovery rate (FDR) procedure. In the first strategy, we applied two filters: (i) Putative DMRs that harbor less than 50 CpG positions, thus are shorter than twice the smoothing window size, were removed. (ii) To avoid situations where two consecutive CpG sites whose genomic locations are remote appear on the same DMR, we modify the vector + as follows. Let , be the distance along the genome (in nucleotides) between CpG sites and . Then, for every site such that , −1 > we set + = 0. We used = 1000 nt for all samples.
To further remove putative DMRs that are unlikely to reflect true DMRs, we eliminated all DMRs where + < + . Here, + is the maximum value of + in the interval as defined earlier, and + is a threshold determined using a false discovery rate (FDR) procedure, see the Filtering out noise section below.

Testing the algorithm
To verify that the approach above results in a low number of false positives, we applied the procedure to deamination maps, when compared to themselves in the form of reconstructed methylomes. As expected, we obtained a negligible number of DMRs, ranging between 0.4% and 1% of the number of DMRs detected between the humans.

Two-way DMR detection
In order to avoid artifacts that could potentially be introduced by comparing DNA methylation maps that were produced using different technologies, our core analysis relied on the comparison of the three reconstructed maps of the Altai Neanderthal, Denisovan, and Ust'-Ishim. These are all high-resolution maps that were derived from genomes sequenced to high coverage (Supplementary Data 1). In particular, the Ust'-Ishim methylome is of exceptional quality due to its high coverage and deamination rate (Supplementary Data 1). Also, going through the same post-mortem degradation processes, the Ust'-Ishim cellular composition is likely to be similar to that of the Neanderthal and Denisovan.
In order for a deamination map to serve as a reference in the comparison, we have transformed its CT ratio values into methylation values (see The reconstruction procedure section above).
To remove potential bias that could be introduced through the comparison of a reconstructed methylation map to a deamination map, we ran each two-way comparison twice: once with the methylation map of sample 1 against the deamination map of sample 2, and once with the deamination map of sample 1 against the methylation map of sample 2 (Supplementary Figure   3). Therefore, the comparison of three genomes required a total of six two-way comparisons: Ust

Three-way DMR detection
In order to identify DMRs where one group of humans (hereinafter, hominin 1) differs from the other two human groups (hereinafter, hominin 2 and hominin 3), we set out to find those DMRs that were detected both between hominin 1 and 2, and between hominin 1 and 3. To this end, we compare the two lists (hominin 1 vs. hominin 2 and hominin 1 vs. hominin 3) and look for overlapping DMRs 1 . An overlapping DMR exists when a DMR from one list partially (or fully) overlaps a DMR from the second list. Only the overlapping portion of the two DMRs from the two-lists was taken.

Filtering out noise
There are different factors that potentially introduce noise into the reconstruction process. These include the stochasticity of the deamination process, the use of a sliding window to smooth the CT signal, and variations in read depth. In order to account for these factors and estimate noise levels, we ran simulations that mimic the post-mortem degradation processes of ancient DNA, then reconstructed methylation maps from the simulated deamination maps and finally compared them to the original map and identified DMRs.
The simulation process starts with a methylation map, where the measured or reconstructed methylation at position is and is assumed the true methylation. Given that is the coverage at this position, we use the binomial distribution (1)

DMRs separating chimpanzees and humans
To identify DMRs that separate chimpanzees from all human groups (both modern and archaic), we first compared the chimpanzee WGBS bone methylome to each of two present-day WGBS maps (those of Bone1 and Bone2). This was done by scanning the chimpanzee map using a sliding window of 25 CpGs, in intervals of one CpG position. In each window, we counted the number of methylated and unmethylated reads in each sample, and computed a P-value using We next developed a second, stricter, scheme by also using the chimpanzee 850K DNA methylation arrays datasets. As the probes cover just part of the CpGs in a DMR, we need to adjust the DMR methylation level in order to allow a meaningful comparison of 850K methylation data to full methylation maps. If we mark by the CpGs in a DMR that are covered This procedure was applied to DMRs covered by at least one probe (~65% of DMRs  There are pros and cons to each of these approaches. Using more chimpanzee datasets allow for more informative process. However, 850K methylation array probes are distributed unevenly across the genome. Although most DMRs are covered by at least one probe (mean number of probes per DMR: 1.7, median: 1, maximum: 64), many are nonetheless not covered. AMH On one hand, lineage assignment of DMRs for which we have array data is more robust and less prone to misclassification. On the other hand, DMRs with array data are more likely to be filtered out, as there is more power to detect variability. This could potentially alter the genomic distribution of DMRs. Therefore, we use both approaches throughout the paper. In analyses where it is important to maintain an unbiased distribution of DMRs we only use the chimpanzee WGBS map for polarization, and AMH bone WGBS maps for filtering (see next chapter), whereas in analyses where it is more important to minimize variability, or where we look at specific DMRs, we use the stricter approach. The chimpanzee RRBS data was adjusted using the same technique. However, it was not used for lineage assignment, but rather only as a source for additional information on DMRs. This is because this protocol particularly targets unmethylated CpGs, and is therefore too biased for lineage assignment.

Removing DMRs with high within-group variability
Our three-way DMR detection algorithm above produces a list of DMRs where one of the three hominins (Ust'-Ishim, Altai Neanderthal or Denisovan) is significantly different from the other two. However, such DMRs could stem from variability within any of the groups, and in such cases cannot be regarded as truly differentiating between the human groups. Some variability may be removed during the process described above (see the Determining the lineages where DMRs originate section), but even DMRs whose origin can be assigned to a particular lineage do The second approach adds to this the 52 450K methylation array samples, as well as the three reconstructed methylation maps from teeth (i.e., Loschbour, Stuttgart and La Braña 1). As described above, using also methylation probes for filtering DMRs provides more power, but can also introduce biases. Thus, this filtering was used for most analyses, except those where unbiased genomic distribution of DMRs is critical. Probe methylation data was corrected as described in equation (2) A general concern in working with DNA methylation data is that DMRs that are specific to one group do not necessarily represent an evolutionary change, but rather reflect a characteristic such as technology used to measure methylation, tissue, sex, disease or age that is shared by individuals in this group and not by others. We take two complementary approaches to ascertain that the DMRs we report are not driven by these factors: (a) for the top DMGs, we match the samples for the above factors and test whether the hypermethylation of AMHs is still observed.
To this end, we compared Ust'-Ishim (adult femur with no known diseases, methylation map produced using our reconstruction method) to the Vindija Neanderthal (adult femur with no known diseases, methylation map produced using our reconstruction method), and we also (b) throughout the pipeline, we take only DMRs where one human group clusters completely outside the other groups regardless of tissue, sex, disease, age or technology. Thus, these factors are unlikely to drive the reported methylation changes. This approach is particularly useful in AMH-derived DMRs, where each group of samples (i.e., AMH samples vs. archaic and chimpanzee samples) include both males and females, juveniles and adults, and they come from femora, ribs, tibia, skulls, and teeth. Thus, it is unlikely that the DMRs that differentiate these groups reflect variability that stems from these parameters 18 (Fig. 1a-c). Archaic-derived DMRs and Neanderthal-derived DMRs are also unlikely to reflect differences in the above parameters, as in these DMRs, the Vindija Neanderthal sample (adult, femur bone) is clustered with the Altai Neanderthal sample (juvenile/adult, phalanx), and not with AMHs, where most samples are from femora of adult females. Denisovan-derived DMRs, on the other hand, are more likely to stem from age or bone type differences than other types of DMRs. This is because the Denisovan sample is the only finger bone, and it comes from a child (6-13.5 years) (Supplementary Data 1).
Thus, we cannot rule out the possibility that some of the Denisovan-derived DMRs reflect finger-specific, rather than lineage-specific methylation patterns. These DMRs could also possibly reflect age-specific differences, but this is less likely, as the AMH I1583 sample 16 Fig. 1).
Note that we do not generally expect the number of DMRs along a lineage to be proportional to the length of the lineage, as this number is determined by several factors. First, the statistical power to detect DMRs depends on coverage and deamination levels. Thus, our ability to detect DMRs was lowest in the Denisovan, and highest in Ust'-Ishim. Second, the ability to filter out within-population variability was substantially higher along the AMH lineage, to which most samples belong. While filtering out such variability, we also exclude variability that exists across both AMH and archaic populations. This filtering also discards genomic regions that are variable between sexes, bone types and regions where methylation patterns tend to be more stochastic.
Variability that exists exclusively along the Neanderthal lineage was partially removed using the Vindija Neanderthal sample, which comes from a different bone (femur vs. phalanx) and age (adult vs. juvenile/adult). Along the Denisovan-lineage, on the other hand, such variability could not be filtered out using our array of samples (Fig. 1).
We also repeated the Gene ORGANizer analyses (see the Gene ORGANizer analysis section) after removal of 20 DMRs that overlap regions which were shown to change methylation during osteogenic differentiation 19 . We show that the enrichment of voice-affecting genes holds, and thus, the differentiation state of cells in the samples is unlikely to explain the results we report.

Comparison to previous reports
We have previously reported that compared to present-day humans, the HOXD cluster of genes is significantly hypermethylated in the Neanderthal and Denisovan samples 1 . Using the current methylation maps, we show that this observation holds (Supplementary Figure 2b). Adding chimpanzee data, we see that similarly to AMHs, chimpanzee samples are also hypomethylated compared to archaic humans. This suggests that the hypermethylation arose along the archaichuman lineage. However, we find that the Ust'-Ishim individual is an outlier among modern humans, and that his methylation levels are closer to the Neanderthal than to modern humans, as was also shown by Hanghøj et al. 20 . The Neanderthal and Ust'-Ishim individuals are found >2 standard deviations from the mean observed methylation in modern humans. This suggests that although the Neanderthal is hypermethylated compared to almost all modern humans, she is not found completely outside modern human variation. The Denisovan, on the other hand, is found even further away, and significantly outside the other populations. Given this, the HOXD DMR was classified as Denisovan-derived (Supplementary Data 2). The Ust'-Ishim remains include a single femur, and to our knowledge, it was not compared morphologically to other humans.
Thus, further analysis is needed in order to determine whether the hypermethylation of the Ust'-Ishim individual compared to other AMHs is manifested in morphological changes as well.
Moreover, as this DMR is classified as Denisovan-derived, we cannot rule out the possibility that it is driven to some extent by age or bone type differences.
Compared to the previously reported DMRs 1 , in this study we found four times as many AMHand archaic-derived DMRs (2,805 full bone DMRs compared to 891) and roughly twice as many DMRs in skeletal tissues. In the previous study, we were therefore able to extrapolate and find trends that extend beyond the skeletal system, such as neurological diseases. In this paper we focus on the skeletal system, hence the different appearance of the body map (Fig. 2b,c) Data 3). Finally, we also find that similarly to the previous report, these DMRs are linked to diseases more often (23.1% compared to the genome average of 10.8%, DAVID OMIM_DISEASE DB).

Computing correlation between methylation and expression
In order to identify regions where DNA methylation is tightly linked with expression levels, we As no expression data were available for Ust'-Ishim, Bone1 and Bone2, we approximated their NFIX expression level by taking the average of NFIX expression from three osteoblast RNA-seq datasets that were downloaded from GEO accession numbers GSE55282, GSE85761 and GSE78608. RNA-seq data for chondrocytes was downloaded from the ENCODE project, GEO accession number GSE78607 and plotted against measured methylation levels in primary chondrocytes (see the Human primary chondrocyte validation chapter). Notably, even though the expression and methylation data come from different individuals, plotting them against one another positions them only ~one standard deviation from the expression value predicted by the regression line (Fig. 5b). Future studies providing RNA expression levels for the laryngeal skeleton and vocal folds might provide further information on the methylation-expression links of these genes.

Studying the function of DMGs
Gene ontology and expression analyses were conducted using Biological Process and UNIGENE expression tools in DAVID, using an FDR threshold of 0.05.

Gene ORGANizer analysis
Similarly to sequence mutations, changes in regulation are likely to be unequally distributed across different body systems, owing to negative and positive selection, as well as inherent traits of the genes affecting each organ. Thus, we turned to investigate which body parts are affected by the DMGs. To this end, we ran the lists of DMGs in Gene ORGANizer 22 , which is a tool that links genes to the organs they affect, through known disease and normal phenotypes. Thus, it allows us to investigate directly the phenotypic function of genes, to identify their shared targets and to statistically test the significance of such enrichments. We ran the lists of DMGs in the ORGANize option using the default parameters (i.e., based on confident and typical+non-typical gene-phenotype associations).
When we ran the list of skeletal AMH-derived DMRs, we found 11 significantly enriched body parts, with the vocal folds and the larynx being the most enriched parts (x2.11 and x1.68, FDR = 0.017 and FDR = 0.046, respectively). Most other parts belonged to the face (teeth, forehead, lips, eyelid, maxilla, face, jaws), as well as the pelvis and nails (Fig. 2c, In order to examine whether such trends could arise randomly from the reconstruction method, we repeated the analysis on the previously described 100 simulations. We ran all simulated DMGs (4,153) in Gene ORGANizer and found that no enrichment was detected, neither for voice-related organs (vocal folds: x0.99, FDR = 0.731, larynx: x.1.02, FDR = 0.966, FDR = 0.966), nor for any other organ.

Validation of face and larynx enrichment in Gene ORGANizer
To test whether the enrichment of the face and larynx could be attributed to the fact that the analyses are based on skeletal tissues, we tested whether the proportion of genes related to the face, larynx, vocal folds and pelvis within AMH-derived skeleton-related DMGs is higher than compared to the fraction expected by chance (5.17x, P = 3.4 x 10 -4 , hypergeometric test). As a control, we then tested how this 5-fold enrichment compares to non-craniofacial features. We used blood-related GWAS as a representative of general non-craniofacial GWAS. We extracted from the GWAS catalog 22 blood-related traits (the same number as extracted for craniofacial features), by taking the first 22 traits that appear in a search for the term blood and applying a threshold of P < 10 -8 . We then used these genes as a background control for the craniofacial enrichment. We observed a 3.86x enrichment of DMGs with regard to craniofacial-vs. noncraniofacial-associated genes (P = 0.01, chi-square test).
Additionally, we conducted a permutation test on the list of 129 AMH-derived DMGs that are linked to organs on Gene ORGANizer, replacing those that are linked to the skeleton with randomly selected skeleton-related genes. We then ran the list in Gene ORGANizer and computed the enrichment. We repeated the process 100,000 times and found that the enrichment levels we observed within AMH-derived DMGs are significantly higher than expected by chance for the laryngeal and facial regions, but not for the pelvis (P = 8.0 x 10 -5 , P = 3.6 x 10 -3 , P = 8.2 x 10 -4 , and P = 0.115, for vocal folds, larynx, face and pelvis, respectively, permutation test, Potentially, longer genes have higher probability to overlap DMRs. Indeed, DMGs tend to be longer (148 kb vs. 39 kb, P = 9.9 x 10 -145 , t-test). We thus checked the possibility that genes affecting the larynx and face tend to be longer than other genes, and are thus more likely to contain DMRs. We found that length of genes could not be a factor explaining the enrichment within genes affecting the larynx, as these genes tend to be shorter than other genes in the genome We also investigated the possibility that (for an unknown reason) the DMR-detection algorithm introduces positional biases that preferentially identify DMRs within genes affecting the voice or face. To this end, we simulated stochastic deamination processes along the Ust'-Ishim, Altai Neanderthal, and Denisovan genomes, reconstructed methylation maps, and ran the DMRdetection algorithm on these maps. We repeated this process 100 times for each hominin and found no enrichment of any body part, including the face, vocal folds, or larynx (1.07x, 1.07x, We also examined if pleiotropy could underlie the observed enrichments. To a large extent, the statistical tests behind Gene ORGANizer inherently account for pleiotropy 22 , hence the conclusion that the most significant shared effect of the AMH-derived DMGs is in shaping vocal and facial anatomy is valid regardless of pleiotropy. Nevertheless, we tested this possibility more directly, estimating the pleiotropy of each gene by counting the number of different Human Phenotype Ontology (HPO) terms that are associated with it across the entire body 23 . We found that DMGs do not tend to be more pleiotropic than the rest of the genome (P = 0.17, t-test), nor do differentially methylated voice-and face-affecting genes tend to be more pleiotropic than other DMGs (P = 0.19 and P = 0.27, respectively, t-test).
Next, we tested whether the process of within-lineage removal of variable DMRs and the Importantly, the link between genetic alterations and phenotypes related to the voice is complex. Some brain-related disorders (i.e., clinical disorders that affect the brain) result in alterations to the voice, the mechanism in which is very difficult to pin down. Although the mechanism leading to voice alterations (either in its pitch, timbre, volume or range) in some of the genes we report is unknown, many of the disorders are skeletal, suggesting the mechanism is related to anatomical changes to the vocal tract. Such changes could also affect more primary functions of the larynx, such as swallowing and breathing. However, the enrichment we observe in Gene ORGANizer shows these genes were also shown to drive vocal alterations in the disorders they underlie 22,23 . Voice and speech alterations were also shown to be driven by cultural, dietary and behavioral changes affecting bite configuration 24 . Here too, these factors are unlikely to underlie the vocal alterations in the genes we report, as individuals from the same family as the individual with the disorder, who do not carry the dysfunctional allele, were not reported to present any vocal phenotypes.
The larynx is an organ which is primarily involved in breathing and swallowing in mammals. In humans, the larynx is also used to produce complex speech, but not every change to the larynx necessarily affects speech. Despite these additional functions, the genes reported by Gene ORGANizer and HPO were specifically associated with voice alterations, directly or indirectly, suggesting that although they could have additional effects, their effect on the voice is their most shared function.

Overlap with enhancer regions
To further test whether the AMH-derived DMRs overlap skeletal regulatory regions, we examined the previously reported 403,968 human loci, where an enrichment of the active enhancer mark H3K27ac was detected in developing human limbs (E33, E41, E44, and E47) 25 .
Each DMR was allocated a random genomic position in its original chromosome, while keeping its original length and matching the distribution of GC-content and CpG density between the original and permutated lists. GC-content and CpG density matching was done by matching a 10-bin histogram of the original and permutated lists. This was repeated for 10,000 iterations.
Computing the density of changes along the genome We computed the density of derived CpG positions along the genome in two ways. First, we used a 100 kb window centered in the middle of each DMR and computed the fraction of CpGs in that window which are differentially methylated (i.e., are found within a DMR). Second, for the chromosome density plots, we did not center the window around each DMR, but rather used

NFIX phenotypes
Skeletal phenotypes that are associated with the Marshall-Smith syndrome were extracted from the Human Phenotype Ontology (HPO) 23 . Non-directional phenotypes (e.g., irregular dentition) and phenotypes that are expressed in both directions (e.g., tall stature and short stature) were removed.
Mutations in NFIX have also been linked to the Sotos syndrome. However, NFIX is not the only gene that was linked to this syndrome; mutations in NSD1 were also shown to drive similar phenotypes 31 . Therefore, it is less relevant in assessing the functional consequences of general shifts in the activity levels of NFIX. Nevertheless, it is noteworthy that in the Sotos syndrome too, most symptoms are a mirror image of the Neanderthal phenotype (e.g., prominent chin and high forehead).

Methylation in AMH, chimpanzee and Neanderthal femora
To check whether the AMH hypermethylation of SOX9, ACAN, COL2A1, XYLT1 and NFIX could be a result of variability between bone types, we compared the four chimpanzee femur 850K methylation arrays to the 52 present-day femur 450K methylation arrays. We took probes within AMH-derived DMRs that appear on both arrays. We found that these genes are consistently hypermethylated in AMHs (P = 1.6x10 -7 , t-test), with 38 probes showing >5% hypermethylation in AMH, whereas only eight probes show such hypermethylation in chimpanzees (Supplementary Figure 5d). Therefore, even when comparing methylation from the same bone, same sex, same developmental stage, measured by the same technology, and across the same positions, AMH show consistent hypermethylation across all of these DMGs.
Similarly, when comparing the DMRs in SOX9, ACAN, COL2A1, XYLT1, and NFIX between the Ust'-Ishim and Vindija Neanderthal samples, the Vindija Neanderthal sample is consistently hypomethylated compared to the Ust'-Ishim individual (P = 1.2 x 10 -5 , Supplementary Figure 5c, t-test). Both of these samples were extracted from femora of adult individuals, and methylation was reconstructed using the same technology. This suggests that the hypermethylation of AMHs compared to Neanderthals is unlikely to be driven by age or bone type, and rather reflects evolutionary shifts.
Scanning SOX9 for mutations altering NFI binding motifs To examine whether the changes in regulation of SOX9 could possibly be explained by changes in the binding sites of NFI proteins, we searched for the NFI motif along the gene body and the 350 kb upstream region of SOX9. We looked for NFI motifs that exist in the genomes of the Altai and Vindija Neanderthal, as well as in the Denisovan, but were abolished in AMHs. We did not find any evidence of such substitutions.

Comparison to divergent traits between Neanderthals and AMHs
To further investigate potential phenotypic consequences of the DMGs we report, we probed the HPO database 23 and compared these HPO phenotypes to known morphological differences between Neanderthals and modern humans 33  genes with the highest fraction of divergent traits between Neanderthals and AMHs (out of a total of 1,789 skeleton-related genes). In fact, COL2A1, which is the top ranked DMR (Supplementary Data 2), is also the gene that is overall associated with the highest number of derived traits (63) (Supplementary Data 7). This suggests that these extensive methylation changes are possibly linked to phenotypic divergence between archaic and AMHs.
individuals. In each two-way comparison, a C→T deamination signal of one hominin was compared to the reconstructed methylation map of the other hominin. This resulted in three lists of pairwise DMRs, that were then intersected to identify hominin-specific DMRs, defined as DMRs that appear in two of the lists. False discovery rates were controlled by running 100 simulations for each hominin, each simulating the processes of deamination, methylation reconstruction, and DMR-detection. Only DMRs that passed FDR thresholds of < 0.05 were kept. To discard non-evolutionary DMRs we used 62 skeletal methylation maps, and kept only loci whose methylation levels differed in one lineage, regardless of age, bone type, disease or sex. Finally, five chimpanzee methylation maps were used to assign the lineage in which each DMR likely emerged. compared to the Vindija Neanderthal (orange). Circles represent mean methylation levels in AMH-derived DMRs. Both samples were extracted from femora of adults, and methylation was reconstructed using the same method. The DMRs presented include also those that were analyzed in the density analyses. The hypermethylation of these genes in AMHs is unlikely to be attributed to age or bone type. t-test P-value is shown. d. COL2A1, ACAN, SOX9, and NFIX are hypermethylated in AMH femora compared to chimpanzee femora. Each pair of box plots represents methylation levels across 52 AMH femora (blue) and four chimpanzee femora