We applied a whole-genome shotgun strategy to sequence the genome of an individual male naked mole rat (NMR) (Table 1 and Supplementary Tables 1–3). The sequencing depth of 98.6% of the genome assembly was more than 20-fold (Supplementary Figs 1–4). The mitochondrial genome was also assembled. Approximately 25% of the NMR genome was represented by transposon-derived repeats, which is lower than in other mammals (40% in human, 37% in mouse, and 35% in rat genomes) (Supplementary Tables 4 and 5, Supplementary Figs 5–7). The predicted NMR gene set included 22,561 genes (Table 1 and Supplementary Table 6), which is comparable to other mammals (22,389 in human, 23,317 in mouse, and 22,841 in rat). Of these, 21,394 (94.8%) genes were transcribed (based on the RNA-seq data for seven organs). More than 98% of NMR genes could be functionally annotated using homology approaches (Supplementary Table 7), and the quality of predicted genes was comparable to that of well-annotated mammalian genomes (Supplementary Tables 6 and 8 and Supplementary Fig. 8).

Table 1 Global statistics of the NMR genome

Most of the NMR genome (93%) showed synteny to human, mouse or rat genomes (Supplementary Table 9), and pairwise comparisons suggested a relatively low rate of NMR genome rearrangements after the split from the murid common ancestor. We defined common synteny blocks in human, mouse, rat and NMR genomes and identified segmental duplications and lineage-specific insertions and deletions (Supplementary Tables 10 and 11 and Supplementary Fig. 9). By analysing single-copy orthologous groups, we constructed a phylogenetic tree involving the NMR and other mammals (Fig. 1). As expected, the NMR placed within rodents and its ancestor split from the ancestor of rats and mice approximately 73 million years ago, whereas the ancestor of NMR, mouse and rat split from rabbit approximately 86 million years ago. Thus, in spite of some exceptional traits, the overall properties of the NMR genome appeared to be similar to those of other mammals.

Figure 1: Relationship of the NMR to other mammals.
figure 1

a, Estimation of the time of divergence (with error range shown in parentheses) of the NMR and six other mammals based on orthology relationship. Distances are shown in millions of years. b, Expansion and contraction in gene families. Numbers designate the number of gene families that have expanded (green) and contracted (red) since the split from the common ancestor. The most recent common ancestor (MCRA) has 10,455 gene families.

PowerPoint slide

Lineage-specific gene family expansions may be associated with the emergence of specific functions and physiology. Compared to other mammals, the NMR showed a moderate number of gene families under expansion and contraction (Fig. 1b), including 96 NMR lineage-specific gene families (Fig. 2). Analysis of syntenic regions identified 750 gained and 320 lost NMR genes (Supplementary Tables 12–14). At least 75.5% of genes gained showed evidence of transcription, and the lost genes were enriched for ribosome and nucleoside biosynthesis functions (Supplementary Table 15). We also identified 244 pseudogenes, containing 183 frameshift and 119 premature termination events (Supplementary Tables 16 and 17). Functional categories enriched for pseudogenes included olfactory receptor activity (GO:0004984, P < 0.001, Fisher’s exact test, 36 genes), visual perception (GO:0007601, P = 0.015, CRB1, CRYBB3, GNAT2, GRK7, GUCA1B and PDE6H), spermatogenesis (GO:0007283, P = 0.044, ADAM29, ADC, CCIN, CCT6B, DEDD, OAZ3 and SHBG), and possibly RING domain (SM00184, P = 0.142, CNOT4, KCNRG, RNF5, TRIM17, TRIML1 and ZSWIM2). The enrichment in the visual perception category appears to underlie the evolution of poor vision in the NMR, whereas many RING-domain-containing proteins act as ubiquitin ligases12. The levels of ubiquitinated proteins in NMRs are lower than in mice and, unlike those in mice, do not change significantly with age6.

Figure 2: Common and unique NMR gene families.
figure 2

This Venn diagram shows unique and overlapping gene families in the NMR (H. glaber), rat (R. norvegicus), mouse (M. musculus) and human (H. sapiens).

PowerPoint slide

Identification of genes that have undergone positive selection in the NMR lineage can provide useful pointers to the evolution of its unique traits. 45 genes (0.4%) were identified as positively selected in the NMR lineage at the false discovery rate of 0.01 and 141 genes (1.2%) at the false discovery rate of 0.05 (Supplementary Table 18). 12 out of the 45 genes (corresponding to the false discovery rate of 0.01) passed a strict manual inspection for alignment quality. In comparison, 0.7% of genes were predicted to be positively selected in the human lineage from high-quality alignments and using Rom correction for multiple testing13. Interestingly, our set included TEP1, encoding a telomerase component, and TERF1, a telomeric repeat binding factor identified at the false discovery rate of 0.05 (Supplementary Fig. 10). The TERF1 gene product is one of six proteins contributing to the shelterin complex, which shapes and protects telomeres14 and has been proposed to regulate telomere length15.

To gain further insights into biological processes that underlie the exceptional traits of the NMR, we identified 39 NMR proteins containing 45 amino acid residues unique among orthologues present in 36 vertebrate genomes (Supplementary Table 19). This gene set included cyclin E1 (CCNE1), uncoupling protein 1 (UCP1) and γ-crystallin (CRYGS), which are associated with the G1/S transition during the cell cycle, thermogenesis and visual function, respectively. Other noteworthy genes were APEX1, a multifunctional DNA repair enzyme, RFC1, replication factor C, and TOP2A, a DNA topoisomerase that controls the topologic states of DNA during transcription. This set also contained eight genes designated as cancer-related16. Finally, TOP2A, along with TEP1 and TERF1 from the set of positively selected genes, are part of a five-protein complex of alternate lengthening of telomere pathway17. Overall, these analyses point to altered telomerase function in the NMR, which may be related to its evolution of extended lifespan and cancer resistance.

We also identified 1.87 million heterozygous single-nucleotide polymorphisms (SNPs). This results in an estimated nucleotide diversity (mean per nucleotide heterozygosity) of 7 × 10−4, which is much lower than in mouse and rat populations and is comparable to the nucleotide diversity observed in humans. Transition nucleotide changes were observed twice as often as transversions, indicating that variant calls reproduce the expected properties of natural variation in other mammals. This low level of nucleotide diversity may reflect a low effective size of NMR population, but may also be due to a high level of inbreeding, a reduced mutation rate or high efficiency of the repair systems. The variation of diversity along the genome was consistent with inbreeding in the NMR population. In protein-coding regions of the genome, our analysis identified 10,951 non-synonymous and 8,616 synonymous SNPs. Their ratio is much higher than in other studied organisms, including human, which appears to signal relaxation of purifying selection in the NMR, potentially as a consequence of reduced effective population size. Finally, we analysed the context dependency of NMR SNPs (Supplementary Fig. 11). Relative rates of nucleotide changes and nucleotide context dependencies were similar to those observed in human polymorphism, with the exception of a relative reduction of SNPs due to CpG mutations. This was caused by a combination of the relatively low CpG density in the NMR genome and a higher fraction of CpG dinucleotides within CpG islands compared to the human genome. CpG density was only 0.19 of that expected on the basis of the GC content, which is lower than in human, dog and panda genomes, but is similar to the mouse genome. However, in comparison to mouse, a higher fraction of CpG dinucleotides was concentrated in CpG islands. CpG dinucleotides within CpG islands contribute less to genetic variation because of their lower methylation rate and possibly also due to selection.

Long lifespan is a key feature of the NMR. To study ageing and longevity, we obtained RNA-seq data for brain, liver and kidney of newborn, young adult (4-year-old) and old adult (20-year-old) NMRs (Supplementary Table 20). In contrast to other mammals, few genes showed differential expression between 4- and 20-year-old NMRs, especially in the brain (Supplementary Tables 21-23). A recent study identified 33 underexpressed and 21 overexpressed genes in the human brain during ageing18. Of these, 32 genes did not show consistent expression changes with ageing in NMRs, including 30 genes that had stable expression and two genes that changed in the opposite direction compared to human brain (Supplementary Table 21). For example, CYP46A1 and SMAD3 were downregulated in the human brain, but showed elevated expression in the NMR brain. The product of the CYP46A1 gene is a mediator of cholesterol homeostasis that influences the tendency of Aβ to aggregate. The product of SMAD3 is a modulator of TGF-β signalling, playing a role in cancer development by slowing down the rate of cell proliferation. Elevated expression of SMAD3 in the NMR during ageing may help optimize the rate of cell death, protecting NMRs from cancer.

A previous meta-analysis of age-related gene expression in mice, rats and humans revealed 56 consistently overexpressed and 17 underexpressed genes19. However, many of these genes did not show the same expression changes, suggesting that different regulatory mechanisms may underlie NMR longevity (Supplementary Tables 22 and 23). For example, genes related to degradation of macromolecules, such as GSTA1, DERL1 and GNS, were not upregulated with age in NMRs. We also found that genes encoding mitochondrial proteins (NDUFB11, ATP5G3 and UQCRQ) were not downregulated, consistent with stable maintenance of mitochondrial function during ageing. It is also of interest that TERT (telomerase reverse transcriptase) showed stable expression regardless of age (Supplementary Fig. 12). This finding is consistent with the role of the telomerase complex, highlighted by positive selection on TEP1 and TERF1. Overall, transcriptome and sequence data revealed different (compared to humans, mice and rats) patterns of NMR genes, which may underlie longevity mechanisms in this animal.

Non-shivering thermogenesis is a major heat production process in mammals that mainly depends on the action of UCP1, one of the 39 vertebrate genes that changed uniquely in the NMR (Supplementary Table 19). UCP1 featured changes in amino acids Gln 146, Arg 263, Trp 264 and Thr 303, with the latter two residues being subject to positive selection (P < 0.05, likelihood ratio test for the branch-site model, n = 30) and Arg 263 and Trp 264 located in the conserved nucleotide binding motif (Fig. 3a). With Arg–Trp instead of the rigid Gly–Pro in the key regulatory site, UCP1 is expected to lose the tight regulation by purine nucleotides as inhibitors and fatty acids as activators (Fig. 3b and c). The same loop also features two positively charged Lys residues followed by a negatively charged residue (also a unique combination), that should markedly affect the local electrostatic potential of UCP1. In addition, Gln 146 replaced a conserved His involved in proton transport, and the same mutation was shown to decrease proton conductance of UCP1 fivefold20. Thr 303 is located in the carboxy-terminal motif (RqTxDCxT) required for binding purine nucleotides21. Taken together, these observations indicate a tight association of UCP1 function with the unique thermoregulation of the NMR22.

Figure 3: Unique changes in UCP1 sequences and their roles in thermoregulation.
figure 3

a, Alignment of mammalian UCP1 sequences. Amino acids unique to the NMR are highlighted in red, and conserved motifs in blue. b, Topology of UCP1. Regions affected in the NMR are highlighted. c, Structural model of UCP1. Location of the channel and the nucleotide-binding loop with altered sequences in the NMR are shown.

PowerPoint slide

In mammals, switches between light and dark periods affect synthesis of the hormone melatonin, which modulates sleep and circadian rhythms. NMRs live in a naturally dark habitat and their pineal glands, where melatonin is synthesized, are atrophied23, but we found that the genes involved in melatonin synthesis (TPH1, TPH2, DDC, AANAT and ASMT) are intact. Interestingly, the expression of genes involved in the final two steps of melatonin synthesis was very low (AANAT) or undetectable (ASMT) in the NMR brain regardless of age (Supplementary Table 24 and Supplementary Fig. 13). Moreover, two major mammalian melatonin receptors (MTNR1A and MTNR1B, encoding MT1 and MT2, respectively) were inactivated by mutations that introduce premature stop signals (Supplementary Fig. 14). Synteny analyses showed that these pseudogenes corresponded to mouse MTNR1A and MTNR1B. Although melatonin signalling appears to be disrupted in the NMR, its circadian rhythms were maintained in terms of locomotor activity and body temperature when exposed to periodic light/dark changes24. Our finding is consistent with a previous report that MT1/MT2 knockout mice maintained essentially normal circadian rhythms25. These mice also showed decreased insulin secretion25. Likewise, our transcriptome analysis of the NMR revealed decreased expression of genes involved in insulin/IGF-1 signalling in the liver compared to mice (Supplementary Fig. 15).

To explain the extraordinary resistance of the NMR to cancer3, a two-tier protective mechanism involving contact inhibition mediated by p16Ink4a and p27Kip1 was proposed4. The involvement of p16Ink4a is unusual, since humans and mice show only contact inhibition mediated by p27Kip1. We analysed the gene locus and the transcriptome reads corresponding to tumour suppressors p16Ink4a and p19Arf. As in mice, the p16Ink4a transcript consists of three exons (Supplementary Fig. 16). However, sequence similarity in the last exon is low, and two early stop codons in the second exon were predicted to result in a shorter, 14-kDa protein (Supplementary Fig. 17). The four ankyrin repeats were, however, intact and Thr69, a residue important for CDK6 binding, was conserved, so the function of the protein may be partially preserved (Supplementary Fig. 18). The p19Arf transcript consists of two exons, but four stop codons in the second exon should lead to a shorter, 10-kDa protein (Supplementary Figs 19–21).

The NMR is also unique in that its skin and cutaneous C-fibres lack the neuropeptide Substance P, making this animal insensitive to certain types of pain10,11. Our analysis revealed the presence of intact TAC1 encoding Substance P. However, the NMR had a deletion in the core promoter region highly conserved among mammals (Supplementary Fig. 22). Thus, this neurotransmitter appears to be functional but may be under unique regulation.

We further examined the molecular basis for poor visual function and small eyes in the NMR. Of the four vertebrate opsin genes (RHO, OPN1LW, OPN1MW and OPN1SW), two (OPN1LW and OPN1MW) were missing (Table 2); this distinguishes the NMR from other rodents with dichromatic colour vision, such as mice, rats and guinea pigs. However, the NMR has intact RHO (rhodopsin) and OPN4 (melanopsin), supporting the presence of rod-dominated retinae and the capacity to distinguish light/dark cues. Of about 200 genes associated with visual perception (GO:0007601) in humans and mice, almost 10% were inactivated or missing in the NMR (Table 2 and Supplementary Fig. 23). These mammalian genes participate in crystallin formation, phototransduction in the retina, retinal development, dark adaptation, night blindness and colour vision. For at least ten of these genes, we observed relaxation of the functional constrain on NMR sequences by estimating the ratio of non-synonymous to synonymous substitutions, which corroborated the dysfunction of these genes. Inactivation of CRYBA4, a microphthalmia-related gene, may be associated with the small-sized eyes, whereas inactivation of CRYBA4 and CRYBB3 and a NMR-specific mutation in CRYGS (Supplementary Table 19) may be associated with abnormal eye morphology26. Thus, while some genes responsible for vision are preserved in the NMR, its poor visual function may be explained by deterioration of genes coding for various critical components of the visual system.

Table 2 Visual perception genes that are inactivated or are missing in the NMR genome

Further analysis revealed substantial divergence of the NMR nuclear receptor corepressor Hairless from other mammalian orthologues and the presence of amino acid replacements associated with the hairless phenotype, which is consistent with the lack of fur in NMRs (Supplementary Fig. 24). In addition, we found substantial sequence variation in the sweet taste receptor and lack of many bitter taste receptors common to other mammals (Supplementary Fig. 25 and 26). In particular, the NMR appears to lack the phenylthiocarbamide taste, a dominant genetic trait in humans, as well as several other common bitter tastes.

Air in NMR burrows is low in O2 (8%) and high in CO2 (>10%) owing to many animals sharing a limited air supply and poor gas exchange through soil27. To cope with the low O2 conditions, the NMR has developed adaptive circulatory (altered haemoglobin oxygen affinity) and metabolic functions, reducing metabolic rate and slowing down development1,8,28,29. To obtain insights into this adaptation, we examined gene expression changes in several tissues of NMR subjected to 8% O2 for one week (Supplementary Tables 25-31 and Supplementary Fig. 27-30). Many changes associated with energy metabolism and redox control were observed. Sequence analysis of NMR hypoxia-induced factor 1α (HIF1α) revealed a T407I exchange unique among mammals and located in the VHL-binding domain. Under normal oxygen conditions, VHL mediates ubiquitin-dependent degradation of HIF1α. In addition, NMR VHL harbours V166I exchange at a functionally important site. These amino acid changes are consistent with relaxation of ubiquitin-dependent degradation of HIF1α, and, thus, with adaptation to low oxygen conditions.

To summarize, sequencing and analysis of the NMR genome revealed numerous insights into the biology of this remarkable animal. In addition, this genome and the associated data sets offer the research communities working in ageing, cancer, eusociality and many other areas a rich resource that can be mined in numerous ways to uncover the molecular bases for the extraordinary traits of this most unusual mammal. In turn, this information provides unprecedented opportunities for addressing some of the most challenging questions in biology and medicine, such as mechanisms of ageing, the role of genetic makeup in regulating lifespan, adaptations to extreme environments, hypoxia tolerance, thermogenesis, resistance to cancer, circadian rhythms, sexual development and hormonal regulation.

Methods Summary

The NMR genome was sequenced on the Illumina HiSeq 2000 platform. The sequenced individual male NMR was from a captive breeding colony located at the University of Illinois, Chicago. The genome was assembled using SOAPdenovo. We obtained 2.5 Gb (gigabase pairs) contig sequences with N50 19.3 kb (kilobase pairs) and N90 4.7 kb, and 2.7 Gb scaffold sequences with N50 1.6 Mb (megabase pairs) and N90 0.3 Mb. (The N50 (or N90) contig size is the length of the smallest contig S in the sorted list of all contigs where the cumulative length from the largest contig to contig S is at least 50% (or 90%) of the total assembly length.) RNA-seq data (ageing and low O2 experiments) were for animals from the same colony. See Supplementary Information for data analysis and additional details.