Introduction

Homo sapiens appears to be a “very special primate”1. Our position among animal species stands out largely thanks to the composite complexity of our cultures, social structures and communication systems. It seems reasonable that this “human condition” is rooted, at least in part, in the properties of our brain, and that these can be traced to changes in the genome on the modern human lineage. This phenotype in the population called “anatomically modern humans” emerged in Africa likely before the deepest divergence less than 100,000–200,000 years ago2,3, although complex population structure may reach back up to 300,000 years ago4,5,6. Except for some early dispersals7,8, humans most likely peopled other parts of the world than Africa and the Middle East permanently only after around 65,000 years ago. It has been claimed that the brain of modern humans adopted a specific, apomorphic growth trajectory early in life that gave rise to the skull shape difference between modern humans and extinct branches of the genus Homo9. Importantly, the growth pattern might differ between the populations10,11, with Neanderthal alleles influencing the endocranial shape in modern humans12, while the brain size and encephalization of humans and Neanderthals is similar9,13,14. This ontogenic trajectory, termed the “globularization phase”, might have contributed to cognitive changes that underlie behavioral traits in which humans differ from their extinct relatives, despite mounting evidence for their cognitive sophistication9,15,16,17,18.

We are now in a favorable position to examine the evolution of human biology with the help of the fossil record, in particular thanks to breakthroughs in paleogenomics: The recent reconstruction of the high quality genomes of members of archaic Homo populations19,20,21 has opened the door to new comparative genomic approaches and molecular analyses. The split of the lineages leading to modern humans and other archaic forms (Neanderthals and Denisovans) is estimated to around 600,000 years ago2, setting the timeframe for truly modern human-specific changes after this split, but before the divergence of modern human populations (Fig. 1). Together with efforts to explore present-day human diversity22, this progress has allowed to narrow down the number of candidate point mutations from ~35 million differences since the split from chimpanzee when comparing only reference genomes23 to 31,389 fixed human-specific changes in a previous seminal study1. Other types of more complex changes like structural variants most likely contributed to human-specific traits. For example, it is well known that since the split from chimpanzees functional differences arose through gene duplications in ARHGAP11B and other genes24,25, copy number variants in SRGAP2 and other genes26,27,28 or regulatory deletions29. In these cases, the variants arose before the split between humans and Neanderthals, but the differences in structural variation that exist between the hominin lineages30 need to be explored in more detail, with advancement of technologies in ancient DNA sequencing and computational methods. This will result in complementary lists of changes for understanding the human condition outside the scope of this study.

Figure 1
figure 1

Conceptual summary of this study.

Some of the single nucleotide changes have been linked to putative functional consequences1,20,31, and evidence is mounting that several molecular changes affecting gene expression in the brain were subject to selective pressures32,33,34,35,36. Furthermore, the genomic impact of interbreeding events is not evenly distributed across the genome. Genes expressed in regions of the brain regarded as critical for certain cognitive functions such as language are depleted in introgressed archaic genetic material37,38,39,40, and introgressed alleles are downregulated in some of these brain regions, suggesting natural selection acting on tissue-specific gene regulation41. Thus, it seems reasonable to conclude that there were differences between anatomically modern human and Neanderthal brains, and that these underlie at least some of the characteristics of our lineage42. We want to emphasize that such recent differences are likely to be subtle when compared to those after the split from our closest living relatives on a scale of 6–10 million years43, where fundamental changes arose since the divergence from chimpanzees and bonobos44,45. The observation of recurrent gene flow between modern human and archaic populations also implies a broad overall similarity, yet, such subtle differences may still have contributed to the evolutionary outcome18. This does not imply a superiority of humans, but specific changes that might have facilitated survival under the given environmental conditions. Obviously, not all human-specific changes are beneficial: while most mutations may be rather neutral and have little effect on the phenotype, some may have had deleterious effects or side-effects, possibly increasing the risks for neurodevelopmental or neurodegenerative disorders in humans46,47,48.

The goal of this paper is to provide a revised, extended set of recent single nucleotide changes in humans since their split from Neanderthals that could enrich our understanding of the molecular basis of the recent human condition. The previous focus on fixed alleles was reasonable given limited data1, but having a better grasp of the magnitude of modern human variation and the interaction between different hominin lineages seems a good reason to cast a wider net, and take into account not only fixed differences but also high-frequency changes shared by more than 90% of present-day individuals. Here, we present a revised list of 36 genes that carry missense substitutions that are fixed across thousands of human individuals and for which all archaic hominin individuals sequenced so far carry the ancestral state. In total, 647 protein-altering changes in 571 genes reached a frequency of at least 90% in the present-day human population. We attempt to interpret this list, as well as some regulatory changes, since it seems very likely that some of these genes would have contributed to the human condition. We discuss some of their known functions, and how these relate to pathways that might have been modified during human evolution from the molecular level to cellular features and more complex phenotypic traits (Fig. 1). We restrict our attention to genes where the literature allows reasonably firm conclusions and predictions about functional effects, since many genes likely have pleiotropic functions49. Obviously, it cannot be emphasized enough that ultimately, experimental validation will be needed to confirm our hypotheses concerning alterations in specific functions. For example, transcription factors or enzymatically active proteins can be tested using cell cultures or in vitro assays, while brain organoids could be used to test differences in neuronal functions50, especially in combination with single-cell RNA sequencing51,52,53. Ultimately, these variants can be introduced into model organisms like mice to test complex features related to cognitive abilities or behavior54. Still, given existing limitations concerning the amount of changes that can be tested at once, networks that are modified by multiple changes cannot be tested with current technologies, making synthesizing attempts like ours valuable.

Results

Genetic differences between present-day humans and archaic hominins

Using publicly available data on one Denisovan and two Neanderthal individuals and present-day human variation (Methods), we calculated the numbers of single nucleotide changes (SNCs) which most likely arose recently on the respective lineages after their split from each other, and their functional consequences (Table 1). Previously, a number of 31,389 sites has been reported as recently fixed derived in present-day humans, while being ancestral in archaics1,20. We find a smaller number of only 12,027 positions in the genome, in part due to the inclusion of another archaic individual and different filters, but mainly because of a richer picture of present-day human variation. The 1,000 Genomes Project as well as other sources contributing to the dbSNP database now provide data for thousands of individuals, which results in very high allele frequencies for many loci, instead of full fixation. Indeed, 29,358 positions show allele frequencies larger than 0.995, demonstrating that the level of near-fixation is similar to the level of previously presented fixation. The number of loci with high frequency (HF) changes of more than 90% in present-day humans is an order of magnitude larger than the number of fixed differences. This cutoff is somewhat arbitrary and based on previous work20. However, when increasing the frequency cutoff, the number declines sharply, while decreasing it results in a near-linear increase of sites (Fig. S1). The three archaic individuals carry more than twice as many changes than present-day humans; however, we emphasize that much of this difference is not due to more mutations in archaics, but rather the fact that data for only three individuals is available, compared to thousands of humans. The variation across the archaic population is not represented equally well, which makes these numbers not directly comparable. On the other hand, much less variation is found by the sequencing of each additional Neanderthal individual compared to humans due to the low diversity of Neanderthals (Fig. S36 in reference 22). This low diversity across their geographic range suggests that most alleles observed as ancestral here will be the same state in other individuals. Furthermore, we take variability into account due to gene flow or errors, decreasing the possibility that positions ancestral in the archaic individuals studied to date turn out to be derived in most archaic individuals, leading to the prediction that this extended catalog will likely not undergo drastic changes. However, changes in structural variants or regions of the genome that are not accessible by current sequencing technologies will most likely complement our results45.

Table 1 Summary of single nucleotide changes.

Present-day humans carry 42 fixed amino acid-changes in 36 genes (Table 2, Fig. 2), while Neanderthals carry 159 such changes. Additionally, modern humans carry 605 amino acid-changes at high frequency (human-lineage high-frequency missense changes, referred to as HHMCs), amounting to a total of 647 such changes in 571 genes (Table S1). Together with 323 SNCs on the human lineage with low confidence (Methods, Table S2), almost 1,000 putative protein-altering changes were found across most present-day humans. Generally, synonymous changes are found at a similar magnitude as missense changes, but only few SNCs altering start and stop codons, and thousands of changes in putative regulatory and untranslated regions. We admit that some of the loci presented here are variable across the phylogenetic tree, or less reliable due to low coverage in the archaics, but we accept this since our intention is retrieve an inclusive picture of possibly functional recent changes. The 42 protein-altering changes for which the ancestral allele has not been observed in any present-day human, most of which have been presented before1, constitute without doubt the strongest entry points into a molecular understanding of the human condition, and should be prime candidates for experimental validation. Only one gene, SPAG5, carries three such SNCs, and four genes (ADAM18, CASC5, SSH2 and ZNHIT2) carry two fixed protein-coding changes in all modern humans. We identified 15 SNCs (in AHR, BOD1L1, C1orf159, C3, DNHD1, DNMT3L, FRMD8, OTUD5, PROM2, SHROOM4, SIX5, SSH2, TBC1D3, ZNF106, ZNHIT2) that have not been previously described as fixed differences between humans and archaics. We note that another 12 previously described1 protein-altering substitutions were not found among the genotypes analyzed here (in C21orf62, DHX29, FAM149B1, FRRS1L, GPT, GSR, HERC5, IFI44L, KLF14, PLAC1L, PTCD2, SCAF11). These genotype calls are absent from the files provided for the three archaic genomes due to different genotype calling and filtering procedures compared to the original publication of the Altai Neanderthal genome20,21. Hence, some potentially relevant candidate changes were not included here, and future research is necessary to evaluate these as well. Despite attempting an extended interpretation, our data is thus not fully exhaustive.

Table 2 Genes with fixed non-synonymous changes on the human lineage, genes under positive selection with HHMCs, and deleterious candidate HHMCs.
Figure 2
figure 2

Features discussed in this study. From inside to outside: Genes with HHMCs and signatures of positive selection (compare Table 2), genes with fixed non-synonymous SNCs on the human lineage, HHMCs, AHMCs, karyogram of human chromosomes.

It is noteworthy that the number of fixed SNCs decreased substantially, and it is possible that single individuals will be found to carry some of the ancestral alleles for the remaining fixed sites. Hence, it is important to focus not only on fixed differences, but also consider variants at high frequency. When analyzing the 647 HHMCs, 68 genes carry more than one amino acid-altering change. Among these, TSGA10IP (Testis Specific 10 Interacting Protein) and ABCC12 (ATP Binding Cassette Subfamily C Member 12) carry four such changes, and seven more genes (MUC5B, NPAP1, OR10AG1, OR5M9, PIGZ, SLX4, VCAN) carry three HHMCs. 1,542 genes carry at least one HF missense change on the archaic lineage (archaic-lineage high-frequency missense change, referred to as AHMC, Tables S3, S4). We find an overlap of 122 genes with HHMCs and AHMCs, which is more than expected considering that among 1,000 sets of random genes of a similar length distribution, no overlap of this extent was observed. The same genes seem to have acquired missense changes on both lineages since their divergence more often than expected. We find a high ratio of HHMCs over synonymous changes for chromosome 21 (1.75-fold), and a very small ratio (0.18-fold) for chromosome 13. We do not find such extreme ratios for AHMCs and corresponding synonymous changes, suggesting differences in the distribution of amino acid changes between both lineages (Fig. S2).

Ranking and enrichment

We assessed the impact of mutations for different deleteriousness scores (Table 2), finding 12 genes with deleterious HHMCs according to SIFT, 3 according to PolyPhen, and 16 when using the Grantham score (>180), measuring the physical properties of amino acid changes. The C-score and GWAVA can be used to rank all mutation classes, including non-coding changes, and we present the top candidates.

Then, we attempted a ranking of genes by the density of lineage-specific changes in the dataset. As expected, the total number of segregating sites is correlated with gene length (Pearsons’ R = 0.93). This correlation is weaker for HF human SNCs (R = 0.73) and fixed human-specific SNCs (R = 0.25), as well as for fixed (R = 0.37) and HF (R = 0.82) SNCs in archaics. We conclude that some genes with a large number of human-specific changes might carry these large numbers by chance, while others are depleted. Indeed, 17,453 (88.9%) of these genes do not carry any fixed human-specific change, and 80.5% do not carry fixed archaic-specific changes. Of note, genes that have attracted attention in the context of traits related to the “human condition” like CNTNAP2 and AUTS2 are among the longest genes in the genome, hence changes in these genes should be interpreted with caution as they are not unexpected. We ranked the genes by the number of HF changes in either modern humans or archaics, divided by their genomic lengths, and categorize the top 5% of this distribution as putatively enriched for changes on each lineage (Table S5). We note that 191 genes (30.9%) fall within this category for both human HF changes and archaic HF changes, as a result of differences in mutation density. In order to distinguish a truly lineage-specific enrichment, we calculated the ratios of HF changes for humans and archaics, defining the top 10% of genes in this distribution as putatively enriched (Table S5). Among the genes enriched for changes on the modern human lineage, 18 carry no HF changes on the archaic lineage, and ten of these also fall within the 5% of genes carrying many changes considering their length (ARSJ, CLUAP1, COL20A1, EPPIN, KLHL31, MKNK1, PALMD, RIC3, TDRD7, UBE2H). These might be candidates for an accumulation of changes, even though this is not identical to selective sweep signals. Among these, the collagen COL20A1 and the Epididymal Peptidase Inhibitor EPPIN carry HHMCs. ACAD10, DST and TTC40, which carry two HHMCs, might be other notable genes with a human-specific enrichment.

Gene Ontology (GO) categories are neither enriched for HHMCs on the human lineage in a hypergeometric test, nor for genes carrying AHMCs, HF changes in UTRs or transcription factor binding sites. However, instead of singular changes that might be observed more often in long genes, or genes that are more prone to mutations in hominins, the density of HF changes in a gene might yield a better picture of lineage-specific changes, possibly for cumulative changes. We applied a test for the ratio of the number of gene-wise HF changes on one lineage over the other lineage, finding an enrichment for 12 GO categories on the human lineage (Table S6), with “soft palate development”, “negative regulation of adenylate cyclase activity”, “collagen catabolic process” and “cell adhesion” in the biological process category. Among the cellular components category, the “postsynaptic membrane”, “spermatoproteasome complex”, “collagen trimer”, “dendrite” and “cell junction” show enrichment, as well as the molecular functions “calcium ion binding”, “histone methyltransferase activity (H3-K27 specific)” and “metallopeptidase activity”. We find no GO enrichment for genes with an excess of changes on the archaic lineage. In order to approach a deeper exploration of genes with associated complex traits in humans, we explored the NHGRI-EBI GWAS Catalog55, containing 2,385 traits. We performed a systematic enrichment screen, finding 17 unique traits enriched for genes with HHMCs, and 11 for genes with AHMCs (Table S7). Changes in genes associated to “Cognitive decline (age-related)”, “Rheumatoid arthritis” or “Major depressive disorder” might point to pathways that could have been influenced by protein-coding changes on the human lineage. In archaics, genes are enriched, among others, for associations to traits related to body mass index or cholesterol levels, which might reflect differences in their physiology.

We find a significant enrichment of protein-protein interactions (P = 0.006) among the gene products of HHMC genes (Fig. S3), meaning that these proteins interact with each other more than expected. Functional enrichment is found for the biological process “cellular component assembly involved in morphogenesis”, most strongly for the cellular components cytoskeleton and microtubule, as well as the molecular function “cytoskeletal protein binding”. Three proteins have at least 20 interactions in this network and might be considered important nodes: TOP2A, PRDM10 and AVPR2 (Table S8). However, proteins encoded by genes with synonymous changes on the modern human lineage seem to be enriched for interactions as well (P = 0.003), as are proteins encoded by genes with AHMCs (P = 1.68 × 10−14), with an enrichment in GO categories related to the extracellular matrix and the cytoskeleton, and proteins with more than 40 interactions (Table S8). We caution that these networks might be biased due to more mutations and possibly more interactions in longer, multi-domain genes.

Regulatory changes might have been important during our evolution56, hence we tested for an overrepresentation of transcription factors. We find 78 known or putative transcription factors among the HHMC genes (Table S9) on the modern human lineage57, which is no overrepresentation among genes with HHMCs (with 49.2% of random genes sets containing fewer HHMCs). Despite no enrichment as a category, single transcription factors on the modern human lineage might have been important, particularly those with an excess of modern human over archaic HF changes (AHR, MACC1, PRDM2, TCF3, ZNF420, ZNF516). Others, like RB1CC120 or PRDM10 and NCOA633 have been found in selective sweep screens, suggesting contributions of individual transcription factors, rather than the whole class of proteins. We also tested for an enrichment of gene expression in different brain regions and developmental stages58,59, using the HF synonymous changes on each lineage as background sets. We find an enrichment of gene expression in the orbital frontal cortex at infant age (0–2 years) for genes with HHMCs, but no enrichment for genes with AHMCs. Furthermore, when testing the genes with HHMCs and using the set of genes with AHMCs as background, “gray matter of forebrain” at adolescent age (12–19 years) is enriched, while no enrichment was found for genes with AHMCs.

Discussion

The enrichment of broad categories above suggests traits prominently represented by HHMCs, some of which are possibly brain-related. It should be noted that such results would be less clear if we just focused on completely fixed changes, given the drastically reduced number of genes harboring such changes. It seems likely that many human-specific traits will rather be the consequence of cumulative changes than a single change60. Hence, we suggest that the “full modernity” of modern humans is constituted from a network of changes, where the presence of single ancestral alleles in some individuals would not lead to “partially modern” phenotype. Here, we will further examine the possible impact on the brain that some of these changes might have, paying special attention to hypotheses formulated in earlier work on modern human-specific changes. Our extended catalog of changes appears to provide additional support for some of these hypotheses.

Cell division and the brain growth trajectory

It has been proposed previously that protein-coding changes in cell cycle-related genes are highly relevant candidates for human-specific traits1,20, with the brain being specifically sensitive to such changes61. Indeed, three genes (CASC5, SPAG5, and KIF18A) have been singled out as involved in spindle pole assembly during mitosis1. Other genes with protein-coding SNCs (NEK6, STARD9/KIF16A and CDK5RAP2) turn out to be implicated in the regulation of spindle pole assembly as well62,63,64. Among the 15 fixed protein-coding changes identified here but absent from previous analyses1,20, some might have also contributed to complex modifications of pathways in cell division, like AHR65 or DNHD166 (Supplementary Information 1), as well as other genes with HHMCs, like CHEK167 or the gene encoding for the protein TOP2A68, which shows the largest number of interactions with other HHMC-carrying proteins, suggesting a function as interaction hub in the cell division complex (Supplementary Information 1). Taken together, these changes suggest that the cell cycle machinery might have been modified in a specific way in humans compared to other hominins.

It has been claimed20 that genes with fixed non-synonymous changes in humans are also more often expressed in the ventricular zone of the developing neocortex, compared to fixed synonymous changes. Since the kinetochore-associated genes CASC5, KIF18A and SPAG5 are among these genes, it has been emphasized that this “may be relevant phenotypically as the orientation of the mitotic cleavage plane in neural precursor cells during cortex development is thought to influence the fate of the daughter cells and the number of neurons generated69,20”. Several fixed SNCs on the modern human lineage are observed for CASC5 (two changes) and SPAG5 (three changes), which is also among genes with a relatively high proportion of HF changes (Table S5). The changes in KIF18A, KIF16A and NEK6 can no longer be considered as fixed, but occur at very high frequencies (>99.9%) in present-day humans. We attempted to determine whether an enrichment of genes with HHMCs on the human lineage can be observed in the ventricular zone59, but instead find an enrichment in the intermediate zone, where less than 5% of random gene sets of the same size are expressed. However, synonymous HF changes also show an enrichment in this layer, as well as genes with AHMCs (Table S10), suggesting an overrepresentation of genes that carry mutations in the coding regions rather than lineage-specific effects. We were able to broadly recapitulate the observation of an enrichment of expression in the ventricular zone if restricting the test to genes with non-synonymous changes at a frequency greater than 99.9% in present-day humans, which is not observed for corresponding synonymous and archaic non-synonymous changes (Table S10). Among the 28 genes expressed in the ventricular zone that carry almost fixed HHMCs, four might be enriched for HF changes in humans (HERC5, LMNB2, SPAG5, VCAM1), and one shows an excess of HF changes on the human compared to the archaic lineage (AMKMY1). Other notable genes discussed in this study include ADSL, FAM178A, KIF26B, SLC38A10, and SPAG17.

The centrosome-cilium interface is known to be critical for early brain development, and centrosome-related proteins are overrepresented in studies on the microcephaly phenotype in humans70. We find 126 genes (Table S9) with 143 HHMCs that putatively interact with proteins at the centrosome-cilium interface71. Some of the genes listed here and discussed in this study, such as FMR1, KIF15, LMNB2, NCOA6, RB1CC1, SPAG5 and TEX2, harbor not only HHMCs, but an overall high proportion of HF changes on the human lineage. Although an early analysis suggested several candidate genes associated to microcephaly, not all of these could be confirmed by high-coverage data. Among eleven candidate genes32, only two (PCNT, UCP1) are among the HHMC gene list presented here, while most of the other changes are not human-specific, and only PCNT has been related to microcephaly72. Nevertheless, more changes related to microcephaly are found on both lineages, for example in ATRX73 or CASC574 (Supplementary Information 3).

Changes in genes associated with brain growth trajectory differences lead not necessarily to a decrease but also an increase of brain size75, suggesting that the disease phenotype of macrocephaly might point to genes relevant in the context of brain growth as well. One of the few genes with several HHMCs, CASC5, has been found to be associated with gray matter volume differences76. It has been claimed that mutations in PTEN alter the brain growth trajectory and allocation of cell types through elevated Beta-Catenin signaling77. This well-known gene, critical for brain development78, has not been highlighted in the context of human-specific changes, while we find that PTEN falls among the genes with an excess on the modern human over the archaic lineage, suggesting that regulatory changes in this gene might have contributed to human-specific traits. This is also the case for the HHMC-carrying transcription factor TCF3, which is known to repress Wnt-Beta-Catenin signaling and maintain the neural stem cell population during neocortical development79. Changes in these and other genes (Supplementary Information 3) like CCND280, GLI381, or RB1CC182, for which a regulatory SNC has been suggested to modify transcriptional activity83 and which carries a signature of positive selection20, could have contributed to the brain growth trajectory changes hypothesized to give rise to the modern human-specific globular braincase shape during the past several 100,000 years9,11,15. Finally, we find changes that might have affected the size of the cerebellum, a key contributor to our brain shape11,84, possibly even since the split from Neanderthals85: For example, HF regulatory SNCs in ZIC1 and ZIC486, an excess of HF mutations in AHI187, and a deleterious HHMC in ABHD14A, which is a target of ZIC188.

Cellular features of neurons

To form critical networks during the early development of the brain, axonal extensions of the neurons in the cortical region must be sent and guided to eventually reach their synaptic targets. Studies conducted on avian vocal learners89,90 have shown a convergent differential regulation of axon guidance genes of the SLIT-ROBO families in the pallial motor nucleus of the learning species, allowing for the formation of connections virtually absent in the brains of vocal non-learners. In modern humans, genes with axon-guidance-related functions such as FOXP2, SLIT2 and ROBO2 have been found to lie within deserts of archaic introgression39,40,91, suggesting incompatibilities between modern humans and archaics for these regions. Even though these particular genes do not carry protein-coding changes, but potentially relevant regulatory changes92, our dataset contains a fair amount of genes known to impact brain wiring: Some of the aforementioned microtubule-related genes, specifically those associated with axonal transport and known to play a role in post-mitotic neural wiring and plasticity93, are associated with signals of positive selection, such as KIF18A94 or KATNA195,96. Furthermore, an interactor of KIF18A, KIF1597, might have been under positive selection in modern humans33, and contains two HHMCs. Versican (VCAN), which promotes neurite outgrowth98, carries three HHMCs, and SSH2 (two HHMCs) might be involved in neurite outgrowth99. PIEZO1, which carries a non-synonymous change that is almost fixed in modern humans, is another factor in axon guidance100, as well as NOVA1101, which is an interactor of ELAVL4102, a gene that codes for a neuronal-specific RNA-binding protein and might have been under positive selection in humans33,36. Furthermore, we find one of the most deleterious regulatory SNCs in the Netrin receptor UNC5D, which is critical for axon guidance103.

We also detect changes in genes associated with myelination and synaptic vesicle endocytosis, critical to sustain a high rate of synaptic transmission, including DCX104, SCAP105, RB1CC1106, ADSL107 and PACSIN1108 among others (Supplementary Information 2). It is noteworthy that among traits associated with cognitive functions such as language or theory of mind, the timing of myelination appears to be a good predictor of computational abilities109,110. Computational processing might have been facilitated by some of the changes presented here, at least in some of the circuits that have expanded in our lineage111,112, since subtle maturational differences early in development113 may have had a considerable impact on the phenotype. In this context, it is worth mentioning that in our dataset, several genes carrying HHMCs and associated with basal ganglia functions (critical for language and cognition) stand out, like SLITRK1114 and NOVA136,115,116,117,118 (Supplementary Information 4). Finally, in the broader context of traits potentially related to cognition, we find an enrichment of HHMCs in genes associated to “Alzheimer’s disease (cognitive decline)” and “Cognitive decline (age-related)”, with seven associated genes (COX7B2, BCAS3, DMXL1, LIPC, PLEKHG1, TTLL2 and VIT). Among genes influencing behavioral traits (Supplementary Information 4) are GPR153119, NCOA6120, or the Adenylosuccinate Lyase (ADSL)121, for which the ancestral Neanderthal-like allele has not been observed in 1,000 s of modern human genomes and which has been pointed out before as under positive selection31,33,34,122. We know that archaic hominins likely had certain language-like abilities123,124, and hybrids of modern and archaic humans must have survived in their communities125, underlining the large overall similarity of these populations. However, genes associated with axon guidance functions, which are important for the refinement of neural circuits including those relevant for speech and language, are found in introgression deserts126,127. We suggest that modifications of a complex network in cognition or learning took place in modern human evolution128, possibly related to other brain-related9,16,129,130, vocal tract131 or neural changes132.

The craniofacial phenotype

In previous work on ancient genomes changes related to craniofacial morphology have been highlighted31,131, and we find an enrichment of genes with an excess of HF SNCs on the modern human lineage for soft palate development (Table S6). Among genes harboring an excess of HF SNCs associated with specific facial features, we find RUNX2, EDAR, and GLI3133, NFATC1134, SPOP135, DDR2136 and NELL1137, possibly carrying changes in regulatory regions, while mutations in the HHMC-carrying gene encoding for the transcription factor ATRX cause facial dysmorphism138. In addition, genes with HHMCs such as PLXNA2139, EVC2140, MEPE141, OMD142, and SPAG17143 are known to affect craniofacial bone and tooth morphologies. These genes appear to be important in determining bone density, mineralization and remodeling, hence they may underlie differences between archaic and modern human facial growth144. Some of these facial properties may have been present in the earliest fossils attributed to H. sapiens, like the Jebel Irhoud fossils4, deviating from craniofacial features which emerged in earlier forms of Homo145, and may have become established before some brain-related changes discussed here11,146. The gene encoding the transcription factor PRDM10 stands out for carrying HHMCs, being found in selective sweep regions and the second-most interacting protein within the HHMC dataset. Although little is known about PRDM10, it may be related to dendrite growth147 and neural crest related changes that contributed to the formation of our distinct modern face148. Other craniofacial morphology-related genes, such as DCHS2133, HIVEP2149, HIVEP3150, FREM1151, and FRAS1152 harbor AHMCs, while another bone-related gene, MEF2C153, shows an excess of HF changes on the archaic lineage. These changes may underlie some of the prominent derived facial traits of Neanderthals154,155.

Life history and other phenotypic traits

Apart from their consequences for cognitive functions, it has been suggested that changes involved in synaptic plasticity might be interpreted in a context of neoteny33,156,157,158, with the implication of delayed maturation in humans159 and a longer timeframe for brain development. However, given their similar brain sizes160, humans and Neanderthals might both have needed a long overall maturation time161,162. Accordingly, notions like neoteny and heterochrony are unlikely to be fine-grained enough to capture differences between these populations, but early differences in infant brain growth between humans and Neanderthals9,10 could have rendered our maturational profile distinct during limited developmental periods and within specific brain regions, imposing different metabolic requirements163. One of the brain regions where such differences are found is the orbitofrontal cortex (OFC)129, and we find that the OFC at infant age (0–2 years) is enriched for the expression of genes that carry HHMCs compared to synonymous SNCs. We suggest that the development of the OFC in infants might have been subject to subtle changes since the split from Neanderthals rather than a general developmental delay, which is particularly interesting given that this brain region has been implied in social cognition164 and learning165.

Genes carrying HHMCs are enriched for expression in the gray matter of the forebrain at the adolescent age compared to AHMC-carrying genes, hence additional human-specific modifications during this period might have taken place, possibly linked to changes in myelination described above. It has been suggested that differences in childhood adolescence time existed between humans and Neanderthals, after a general developmental delay in the hominin lineage166,167. Dental evidence suggests an earlier maturation in Neanderthals than modern humans168, and it has been claimed that Neanderthals might have reached adulthood earlier169. Furthermore, an introgressed indel from Neanderthals causes an earlier onset of menarche in present-day humans30, supporting at least the existence of alleles for earlier maturation in the Neanderthal population. Among the genes carrying fixed HHMCs, NCOA6 has also been linked to age at menarche and onset of puberty170, as well as placental function171. This putative transcription factor is enriched in HF changes and has been suggested to have been under positive selection on the modern human lineage33,122. The HHMC is located nearby and three 5′-UTR variants within a putatively selected region36, with an estimated time of selection at around 150 kya (assuming a slow mutation rate). Even though this gene carries an AHMC as well, it remains possible that modern humans acquired subtle differences in their reproductive system through lineage-specific changes in this gene. A delay in reproductive age may influence overall longevity, another trait for which our data set yields an enrichment of genes with HHMCs (SLC38A10, TBC1D22A and ZNF516).

The male reproductive system might have been subject to changes as well, since we find that several proteins in spermatogenesis seem to carry two HHMCs: Sperm Specific Antigen 2 (SSFA2), Sperm Associated Antigen 17 (SPAG17), ADAM18172 and WDR52173, out of which ADAM18 and SPAG17 also carry AHMCs. Lineage-specific differences in genes related to sperm function or spermatogenesis might have been relevant for the genetic compatibility between humans and Neanderthals. Another gene harboring a HHMC with similar functions is EPPIN174, which shows no HF changes on the archaic, but 27 such SNCs on the modern human lineage. The gene encoding for the Testis Expressed 2 protein (TEX2) is enriched for HF changes in both humans and archaics, with one HHMC and five AHMCs, but its function is not yet known. Another possible SNC that might be relevant in this context is a splice site change in IZUMO4, since proteins encoded by the IZUMO family form complexes on mammalian sperm175. The adjacent exon is not present in all transcripts of this gene, suggesting a functional role of this splice site SNC. Finally, genes in the GO category “spermatoproteasome complex” are enriched for an excess of HF changes on the human lineage.

It has been found that Neanderthal alleles contribute to addiction and, possibly, pain sensitivity in modern humans176,177. In this context, an interesting protein-truncating SNC at high frequency in humans is the loss of a stop codon in the opioid receptor OPRM1 (6:154360569), potentially changing the structure of the protein encoded by this gene in some transcripts. Other mutations in this gene are associated to heroin addiction178, and pain perception179, but also sociality traits180. Interestingly, a recent study found a pain insensitivity disorder caused by a mutation in ZFHX2181, which carries an AHMC, and three HHMCs are observed in NPAP1, which might be associated with the Prader-Willi syndrome, involving behavioral problems and a high pain threshold182. Such changes may point to differences in levels of resilience to pain between Neanderthals and modern humans.

Conclusion

The long-term evolutionary processes that led to the human condition1 are still subject to debate and investigation, and the high-quality genomes from archaic humans provide opportunities to explore the recent evolution of our species. We want to contribute to an attempt to unveil the genetic basis of specific molecular events in the time-window after the split from these archaic populations and before the emergence of most of the present-day diversity. It needs to be emphasized again that this does not imply a superiority of humans over other populations, but rather small differences after a long shared evolutionary history and their genetic underpinnings. We sought to combine different sources of information, from genome-wide enrichment analyses to functional information available for specific genes, to identify threads linking molecular needles in this expanded haystack. In doing so, we have mainly built on existing proposals concerning brain-related changes, but we have divided the observations into different biological levels, from cellular changes through brain organization differences to complex phenotypic traits. Only future experimental work will determine which of the changes highlighted here contributed significantly to making us “fully human”. We hope that our characterization and presentation of some new candidate genes will help prioritize inquiry in this area, since the specific type of validation depends on each candidate gene or network.

Methods

We used the publicly available high-coverage genotypes for three archaic individuals: One Denisovan19, one Neanderthal from the Denisova cave in Altai mountains20, and another Neanderthal from Vindija cave, Croatia21. The data is publicly available under http://cdna.eva.mpg.de/neandertal/Vindija/VCF/, with the human genome version hg19 as reference, covering ~1.8 billion base pairs of the genome21. We applied further filtering to remove sites with less than 5-fold coverage and more than 105-fold coverage in the Altai Neanderthal or 75-fold coverage in the other archaic individuals, if such cases occurred. We also removed sites with genotype quality smaller than 20, and heterozygous sites with strong allele imbalance (<0.2 minor allele frequency). Although these permissive filters increase power compared to previous studies, we caution that in some cases genotypes might be incorrect. We added the genotype and coverage for the exome and chromosome 21 sequences of the Vindija and El Sidrón Neanderthals from previous studies2,31, with 75-fold and 50-fold coverage cutoffs, respectively. These studies provided data for the same Vindija individual21.

We applied the Ensembl Variant Effect Predictor VEP183 in order to obtain inferences for protein-coding and regulatory mutations, scores for SIFT184, PolyPhen185, CADD186 and GWAVA187, and allele frequencies in the 1000 Genomes and ExAC human variation databases22,188. We used the inferred ancestral allele from published data on multiple genome alignments189, and at positions where this information was not available, the macaque reference allele, rheMac3190. We determined the allele frequencies in present-day humans using the dbSNP database build 147191. We retrieved the counts for each allele type, and summarized the counts of non-reference alleles at each position. Grantham scores192 were calculated for missense mutations.

Data processing and database retrieval was performed using bcftools/samtools v1.0193, bedtools v2.16.2194, and R/Bioconductor195, with rtracklayer196 and biomaRt197 packages, and plotting with Rcircos198. We analyzed all positions where at least two alleles (human reference and alternative allele) were observed among the human reference and at least one out of three of the high-coverage archaic individuals, in at least one archaic chromosome. The 22 autosomal chromosomes and the X chromosome were analyzed, in the absence of Y chromosome data for the three female archaic individuals. The data for 4,437,803 segregating sites is available under http://cbl.ub.edu/index.php/resources and on Figshare under https://doi.org/10.6084/m9.figshare.8184038. The following subsets were created:

Fixed differences: Positions where all present-day humans carry a derived allele, while at least two out of three archaics carry the ancestral allele, accounting for potential human gene flow into Neanderthals.

High-frequency (HF) differences: Positions where more than 90% of present-day humans carry a derived allele, while at least the Denisovan and one Neanderthal carry the ancestral allele, accounting for different types of errors and bi-directional gene flow.

Extended high-frequency differences: Positions where more than 90% of present-day humans carry a derived allele, while one of the following conditions is true: (a) Not all archaics have reliable genotypes, but those that have carry the ancestral allele. (b) Some archaics carry an alternative genotype that is not identical to either the human or the ancestral allele. (c) The Denisovan carries the ancestral allele, while one Neanderthal carries a derived allele, which allows for gene flow from humans into Neanderthals. (d) The ancestral allele is missing in the EPO alignment, but the macaque reference sequence is identical to the allele in all three archaics.

We also created corresponding lists of archaic-specific changes. Fixed changes were defined as sites where the three archaics carry the derived allele, while humans carry the ancestral allele at more than 99.999%. High-frequency changes occur at less than 1% in present-day humans, while at least two archaic individuals carry the derived allele. An extended list presents high-frequency changes where the ancestral allele is unknown, but the macaque allele is identical to the present-day human allele.

A ranking of mutation density was performed for genes with protein-coding sequences and their genomic regions as retrieved from Ensembl. For each gene, unique associated changes as predicted by VEP were counted. A ranking on the number of HF changes per gene length was performed for all genes that span at least 5,000 bp in the genome and carry at least 25 segregating sites in the dataset (at any frequency in humans or in archaics), in order to remove genes which are very short or poor in mutations. The top 5% of the empirical distribution was defined as putatively enriched for changes on each lineage. The ratio of lineage-specific HF changes was calculated for the subset of genes where at least 20 lineage-specific HF changes were observed on the human and the archaic lineages combined. The top 10% of the empirical distribution was defined as putatively enriched for lineage-specific changes.

We performed enrichment tests using the R packages ABAEnrichment58 and DescTools199. We used the NHGRI-EBI GWAS Catalog55, and overlapped the associated genes with protein-coding changes on the human and archaic lineages, respectively. We performed an enrichment test as described elsewhere200: We counted the number of HF missense changes on each lineage and the subset of those associated to each trait (“Disease trait”), and performed a significance test (G-test) against the number of genes associated to each trait, and all genes in the genome, with a P value cutoff at 0.1. This suggests a genome-wide enrichment of changes for each trait. We then performed a G-test between the numbers of HF missense changes on each lineage, and the subset of each associated to each trait (P-value cutoff at 0.1), to determine a difference between the two lineages. We then performed an empirical test by creating 1,000 random sets of genes with similar length as the genes associated to each trait, and counting the overlap with the lineage-specific missense changes. At least 90% of these 1,000 random sets were required to contain fewer missense changes than the real set of associated genes. Only traits were considered for which at least 10 associated loci were annotated.

Gene Ontology (GO) enrichment was performed using the software FUNC201, with a significance cutoff of the adjusted p-value < 0.05 and a family-wise error rate <0.05. When testing missense changes, a background set of synonymous changes on the same lineage was used for the hypergeometric test. When testing genes with relative mutation enrichment, the Wilcoxon rank test was applied. Enrichment for sequence-specific DNA-binding RNA polymerase II transcription factors and transcription factor candidate genes from57, and genes interacting at the centrosome-cilium interface71 was tested with an empirical test in which 1,000 random sets of genes were created that matched the length distributions of the genes in the test list. The same strategy was applied for genes expressed in the developing brain (Table S10)59. Protein-protein interactions were analyzed using the STRING online interface v10.5202 with standard settings (medium confidence, all sources, query proteins only) as of January 2018. The overlap with selective sweep screens considers HHMCs within 50,000 bp of the selected regions20,33,36.