Genomic analysis of snub-nosed monkeys (Rhinopithecus) identifies genes and processes related to high-altitude adaptation

Li Yu, Ya-Ping Zhang, Chung-I Wu and colleagues report the de novo genome of the black snub-nosed monkey Rhinopithecus bieti and the genomic sequences of four other Rhinopithecus species, including three high-altitude and two lowland species. The species- and population-level genomic analyses as well as the transcriptomic analysis and functional assays find adaptive signatures associated with adaptation to high altitude. The snub-nosed monkey genus Rhinopithecus includes five closely related species distributed across altitudinal gradients from 800 to 4,500 m. Rhinopithecus bieti, Rhinopithecus roxellana, and Rhinopithecus strykeri inhabit high-altitude habitats, whereas Rhinopithecus brelichi and Rhinopithecus avunculus inhabit lowland regions. We report the de novo whole-genome sequence of R. bieti and genomic sequences for the four other species. Eight shared substitutions were found in six genes related to lung function, DNA repair, and angiogenesis in the high-altitude snub-nosed monkeys. Functional assays showed that the high-altitude variant of CDT1 (Ala537Val) renders cells more resistant to UV irradiation, and the high-altitude variants of RNASE4 (Asn89Lys and Thr128Ile) confer enhanced ability to induce endothelial tube formation in vitro. Genomic scans in the R. bieti and R. roxellana populations identified signatures of selection between and within populations at genes involved in functions relevant to high-altitude adaptation. These results provide valuable insights into the adaptation to high altitude in the snub-nosed monkeys.

Rhinopithecus genus offers an interesting study model not only to investigate the adaptive mechanisms of non-human primates to high altitude but also to examine the differences and similarities between the adaptive mechanisms in different high-altitude regions.
Here we carried out de novo sequencing and genome assembly of R. bieti using the Illumina HiSeq 2000 platform, generating ~270 Gb of genomic sequence with 76.5-fold coverage (Supplementary Table 1 Table 2). The pattern of GC content for R. bieti is similar to those for macaque and human (Supplementary Fig. 1). We found genomic synteny between R. bieti and rhesus macaque (Macaca mulatta 13 ) (Fig. 2a) and identified 21,812 recent segmental duplications with a total length of 72.1 Mb (2.43%) in the R. bieti genome (Fig. 2b).
The R. bieti genome is predicted to contain 22,834 protein-coding genes (Supplementary Fig. 2). We identified 1,187 gene families that were expanded in R. bieti in comparison to M. mulatta (Fig. 2c). InterPro classification of the genes from 231 significantly expanded gene families showed significant enrichment (P < 0.01) of these genes in categories involved in DNA repair and damage response (94 genes) and oxidative phosphorylation processes (107 genes) (Supplementary Table 3). This finding may be related to the increased exposure of snub-nosed monkeys to UV radiation and the potentially increased rate of energy metabolism required for high-altitude survival [14][15][16][17][18] . However, caution is needed in interpreting gene family expansions and contractions that were identified by comparing genomes assembled using different methods.
In addition to comparative genomic analyses, we performed a comparative transcriptomic analysis of R. bieti and M. mulatta based on ~240 Gb of RNA-seq data from the same 11 tissues in both species (Supplementary Table 4 and Supplementary Note). We observed that the expression profiles of highly expressed genes from the digestive system (small intestine, large intestine, and stomach) and energyconsuming tissues (heart and skeletal muscle) were more similar in  l e t t e r s the same species than the expression patterns for the same tissue in the different species (Fig. 3), reflecting the genetic discrepancy between R. bieti and other primates with respect to dietary and high-altitude adaptation, respectively. Furthermore, analyses of highly expressed genes showed that those involved in the oxidative phosphorylation pathway (KEGG database, ko00190; 28 genes; P = 8.03 × 10 −16 ) and the cardiac muscle contraction pathway (KEGG database, ko04260; 10 genes; P = 1.88 × 10 −5 ) were significantly enriched for expression in energy-consuming tissues from R. bieti, indicating the possible involvement of energy metabolism and heart function in high-altitude adaptations of the snub-nosed monkeys.
To investigate the adaptive mechanisms of snub-nosed monkeys from different high-altitude regions, we performed a comparative genomic analysis of all snub-nosed monkeys by including resequencing genomes for R. brelichi, R. strykeri, and R. avunculus (30-fold coverage on average per species, with one individual per species; Supplementary Table 5) as well as the recently published R. roxellana genome 4 . Genomic sequence data for R. avunculus, with only 250 individuals in the wild, have to our knowledge not previously been reported. The genetic diversity among these snub-nosed monkeys is between 0.014% and 0.068%. R. brelichi has the highest diversity (0.068%), whereas R. strykeri has the lowest diversity (0.014%). R. roxellana (0.054%) and R. avunculus (0.044%) show similar levels of diversity, with the diversity in both species greater than that in R. bieti (0.030%) (Supplementary Fig. 3). On the basis of 69,586,645 SNPs in the five snub-nosed monkeys and macaque, we constructed a phylogenetic tree. Regardless of the reference genomes and treebuilding methods used, the genome phylogeny strongly supported the interspecific relationships shown in a previous tree based on the analyses of several nuclear genes ( Fig. 1) and significantly rejected the relationships represented in a tree based on the mitochondrial genome 12 (P = 1 × 10 −40 in an approximately uniform test). Estimation of divergence times suggests that the two high-altitude 'Himalaya' species R. bieti and R. strykeri diverged from the lowland R. avunculus species 0.80 million years ago and that the high-altitude northern species R. roxellana separated from the lowland R. brelichi species 1.07 million years ago ( Fig. 1 and Supplementary Note). We then identified shared amino acid substitutions that occurred along the lineage containing R. bieti and R. strykeri and the lineage leading to R. roxellana, but not along the lineages leading to the low-altitude R. avunculus and R. brelichi species. Overall, 20 common amino acid substitutions for 18 genes were observed in all highaltitude snub-nosed monkeys (Supplementary Table 6), with this number of substitutions not resulting from chance (P = 3.3 × 10 −16 ). Of the common substitutions identified, 18 occurred in 16 genes that were also identified as evolving under positive selection on the basis of likelihood-ratio tests for these two lineages (Supplementary Table 7). Positively selected genes had a significantly higher ratio of nonsynonymous substitutions to synonymous substitutions than other genes in the high-altitude snub-nosed monkeys (P = 3.84 × 10 −3 for R. strykeri, 3.45 × 10 −6 for R. bieti, and 5.10 × 10 −8 for R. roxellana) (Supplementary Table 8). Further examination of these 16 genes found that 6 genes, including ARMC2, NT5DC1, RNASE4, CDT1, RTEL1, and DNAH11, have functional associations with lung function, angiogenesis, DNA repair, or respiratory cilia movement, suggesting a role in the high-altitude adaptation of the snub-nosed monkeys [19][20][21][22][23][24] (Table 1). Moreover, when the common substitutions for these six genes were examined in more mammalian species with publicly available genomes, no identical substitutions were found in those species (Supplementary Table 9 and Supplementary Note).
To test the potential functional effects of these substitutions, we performed experimental validation of the substitutions identified in CDT1 (p.Ala537Val) and RNASE4 (p.Asn89Lys and p.Thr128Ile) (Fig. 4). CDT1 is a licensing factor for DNA replication that is tightly controlled to maintain genome integrity, and the encoded protein is rapidly degraded by the SCF Skp2 complex upon UV-induced DNA damage 25 . Using plasmids encoding human CDT1 proteins in a HeLa cell culture model, our UV irradiation assays showed that CDT1 Ala537Val was more stable than the reference CDT1 protein after irradiation with 100 J/m 2 UV light (Fig. 4c,d), suggesting that the p.Ala537Val substitution might contribute to the increased resistance to UV radiation of the high-altitude snubnosed monkeys. RNASE4 is a member of the RNase superfamily and has been reported to induce angiogenesis 19 Figure 3 Heat map of genes enriched for expression in the tissues of R. bieti and M. mulatta. The heat map was generated using hierarchical clustering and complete linkage of the top 500 most highly expressed one-to-one orthologous genes from R. bieti and M. mulatta. Distances, representing the relative similarity among genes and tissues, were calculated using Pearson's correlation coefficients. Color represents FPKM value of gene expression after scaling and centering. npg l e t t e r s cloned into an Escherichia coli expression vector. The purity of active proteins was demonstrated by Coomassie blue staining (Fig. 4e).
We then examined the ability of cells to induce endothelial tube formation after addition of recombinant reference or variant (Asn89Lys and Thr128Ile) RNASE4 protein. Variant RNASE4 (Asn89Lys and Thr128Ile; 1 µg/ml concentration) showed higher activity in inducing tube formation in human umbilical vein endothelial cell (HUVEC) cultures in Matrigel than reference RNASE4 (Fig. 4f,g), indicating that the Asn89Lys and Thr128Ile variants may enhance angiogenesis for high-altitude adaptation in snub-nosed monkeys. The fact that heat inactivation or proteolysis abolishes RNASE4 protein activities suggests that intact peptide sequence and structure are also important (data not shown). In addition to species-level investigation, we performed population-level genomic scans for selection in R. bieti (n = 20; ~10-fold coverage per individual) and R. roxellana (n = 26; ~12.29-fold coverage per individual) populations (Supplementary Tables 10-13). Three R. roxellana populations from Sichuan and Gansu (SG; n = 12; 2,500-3,500 m), Qinling (QL; n = 9; 2,000-3,000 m), and Shennongjia (SNJ; n = 5; 2,000-2,500 m), which showed obvious isolation and different demographic scenarios, were compared to one R. bieti population from Weixi (3,300-4,100 m) (Supplementary Fig. 4 and Supplementary npg l e t t e r s Note). These populations occupy different elevations at high altitude and may face different levels of hypoxia stress, similar to those experienced by other high-altitude mammals at comparable altitudes 27,28 . The high-altitude R. strykeri population was not included because of its small size, with fewer than 300 individuals in the wild, which led to sample collection difficulties. By identifying loci and pathways that were outliers using three methods to scan for selection (Supplementary Note), we detected 60 genes in the R. bieti population (3.12 million SNPs), 247 genes in the SG population (5.85 million SNPs), 313 genes in the QL population (4.37 million SNPs), and 269 genes in the SNJ population (4.16 million SNPs) evolving under selection. Inspection of the genes overlapping between populations (Supplementary Fig. 5) identified two genes-BRCC3 (R. bieti, SG, and QL) and RYR2 (R. bieti and SG)-related to DNA repair and heart function, respectively 29,30 (Table 2). Interestingly, RYR2 has also been reported as a key candidate gene related to high-altitude adaptation in Tibetan wolves 31 . In addition to the overlapping loci, we observed nine candidate high-altitude-related loci that were specific to particular populations, which may contribute to the development of unique and local adaptations [32][33][34][35][36][37][38][39][40][41][42][43][44] (Table 2). Interestingly, ATR (R. bieti specific), which functions in DNA repair, regulates the expression of hypoxia-inducible factor (HIF)-1α, and confers hypoxia tolerance in vitro [33][34][35] , was identified here for the first time to our knowledge in R. bieti as a candidate target of selection in high-altitude adaptation. ARNT (SG specific), whose protein product binds to HIF-α to regulate adaptation to hypoxia 37 , has been mentioned as a target of positive selection in the human Tibetan population 45 . ACE (SG specific), which functions in blood pressure regulation and carries variants responsible for high-altitude pulmonary hypertension (HAPH) 38 , has been reported to have a role in high-altitude adaptation of human populations [46][47][48] . In comparison, no significant signatures of selection were observed in hemoglobin genes, which have been identified as critical genes for oxygen binding and resistance to hypoxia in other high-altitude mammals [49][50][51][52][53] . The occurrence of population-specific selection signatures mostly in genes in the R. bieti and SG populations, along with most of the overlapping genes being shared by these two populations, highlights the greater extent of signatures for genomic adaptation to high altitude in R. bieti and SG than in the QL and SNJ populations. This observation is consistent with the fact that the R. bieti and SG populations inhabit the highest altitudes.
Our study extends the genomic analysis of high-altitude adaptation to a unique branch of the non-human primate evolutionary tree.
On the basis of a multilevel survey, the genomic sequence analyses from both the species level and population level, as well as the transcriptomic analyses and functional assays, find adaptive signatures that may reflect molecular adaptations consistent with the high-altitude environments of three snub-nosed monkey species, contributing to a new and more complete understanding of the complex biological features of high-altitude adaptation.

METhoDS
Methods and any associated references are available in the online version of the paper.
Accession codes. All raw data from de novo sequencing and resequencing are available through the NCBI Sequence Read Archive (SRA) under projects PRJNA247935, PRJNA283338, and PRJNA261768. RNA-seq reads are available under project PRJNA248058. The de novo genome assembly for R. bieti has been deposited in the DNA Databank of Japan/European Molecular Biology Laboratory/GenBank under project PRJNA315476.

COMPetING FINANCIAL INteReStS
The authors declare no competing financial interests. on the phylogenetic tree was designated as the foreground branch. Genes with 2∆ln L >3.84 in likelihood-ratio tests were designated as positively selected genes.
Plasmid construction, transfection, cell culture, UV irradiation assays, and immunoblot analysis. A sequence encoding C-terminally 3×FLAG-tagged human CDT1 was synthesized and cloned into the pCDH-MSCV-E2F-eGFP lentiviral vector. The high-altitude variant form of CDT1 (Ala537Val) was generated by mutagenesis PCR. All constructs were verified via sequencing. Lentiviruses were generated according to the manufacturer's protocol. HeLa cells and HEK293T cells were provided by the Kunming Cell Bank, Kunming Institute of Zoology, Chinese Academy of Sciences. The cell lines were verified to be free of mycoplasma contamination by fluorescent staining. HeLa cells were infected by the different lentiviruses and cultured in high-glucose DMEM supplemented with 10% FBS, 1% penicillin-streptomycin and 1 µg/l puromycin (Sigma). All cells were incubated in a humidified atmosphere with 5% CO 2 at 37 °C. For UV irradiation assays, HeLa cells stably expressing the different forms of CDT1-3×FLAG were synchronized by incubation with 50 ng/ml nocodazole (Sigma) for 16 h and then subjected to UV irradiation at 100 J/m 2 . Cells were collected at the indicated time points. Immunoblot analysis was performed according to the standard protocol. The antibodies used included rabbit antibody to CDT1 (Cell Signaling Technology, 8064; 1:500 dilution), mouse M2 antibody to FLAG (Sigma, F3165; 1:1,000 dilution), mouse antibody to GAPDH (ABclonal, AC002; 1:1,000 dilution), horseradish peroxidase (HRP)-conjugated goat anti-mouse antibody (Santa Cruz Biotechnology, sc-2055; 1:2,000 dilution), and HRP-conjugated goat anti-rabbit antibody (Santa Cruz Biotechnology, sc-2054; 1:2,000 dilution). Some cells were incubated with 100 µg/ml cycloheximide for the indicated periods of time.
In vitro angiogenesis assays. The coding sequences for RNASE4 and variant RNASE4 (Asn89Lys and Thr128Ile) from human were synthesized and cloned into the pET21b E. coli expression vector. Recombinant proteins were expressed in BL21 (DE3) cells and purified following the protocol used for the generation of Ang 89 . The ability of RNASE4 and RNASE4 (Asn89Lys and Thr128Ile) to induce endothelial tube formation was examined. Commercially available Ang (Abcam, 114413) and PBS were used as positive and negative controls, respectively. A 48-well plate was precoated with 100 µl/well of growthfactor-reduced Matrigel. HUVEC cells were purchased from the American Type Culture Collection (CRL-1730). Primary culture passage 3 HUVECs were seeded onto the Matrigel (2 × 10 4 cells/well) and cultured in the presence of recombinant reference or variant (Asn89Lys and Thr128Ile) RNASE4 for 4 h. Cells were then fixed with 3.7% paraformaldehyde and photographed. Images were quantified using ImageJ software for capillary-like structures. Experiments were performed in triplicate, and significance was determined by Student's t test (two-tailed).
Genome resequencing of R. bieti and R. roxellana populations. Libraries with an insert size of 500 bp were constructed to resequence R. bieti and R. roxellana individuals according to the Illumina protocol. All libraries were sequenced on the HiSeq 2000 platform (Supplementary Tables 10-13). In addition, a published resequenced R. bieti individual was also included in the present population genomic study 4 .

Scans of R. bieti and R. roxellana populations for selection.
Using BWA with default parameters 56 , raw paired-end reads from each of 20 R. bieti individuals were aligned to the R. bieti assembled genome (Rb0) and the reads from each of 26 R. roxellana individuals were aligned to the published R. roxellana genome 4 (Supplementary Tables 10-13). We performed SNP calling using SAMtools 57 (Supplementary Note). The θπ method was used to identify genomic regions as population outliers with VCFtools v0. 1.11 (ref. 90). Z H and LSBL scans were processed as described in Ai et al. 91 with the slipped windows strategy. Genes detected by at least two of the three scanning methods were identified as candidate genes under selection (Supplementary Note).
Population structure analyses and demographic history reconstruction. SNPs in scaffolds longer than 50 kb were extracted. To avoid the effect of linkage disequilibrium, we selected one SNP for each interval of 50 kb and ran Admixture 92 to perform genetic clustering. Demographic history was inferred using the pairwise sequentially Markovian coalescence (PSMC) model 93 (Supplementary Note). The generation time (g = 5 years) 94,95 and mutation rate (5 × 10 −9 mutations per generation) 96,97 were derived from previous studies.