Muntjac chromosome evolution and architecture

Despite their recent divergence, muntjac deer show striking karyotype differences. Here we describe new chromosome-scale genome assemblies for the Chinese and Indian muntjacs, Muntiacus reevesi (2n=46) and Muntiacus muntjak (2n=6/7), and analyze their evolution and architecture. We identified six fusion events shared by both species relative to the cervid ancestor and therefore present in the muntjac common ancestor, six fusion events unique to the M. reevesi lineage, and twenty-six fusion events unique to the M. muntjak lineage. One of these M. muntjak fusions reverses an earlier fission in the cervid lineage. Although comparative Hi-C analysis revealed differences in long-range genome contacts and A/B compartment structures, we discovered widespread conservation of local chromatin contacts between the muntjacs, even near the fusion sites. A small number of genes involved in chromosome maintenance show evidence for rapid evolution, possibly associated with the dramatic changes in karyotype. Analysis of muntjac genomes reveals new insights into this unique case of rapid karyotype evolution and the resulting biological variation.

probability arising, for example, by chromosomal proximity in the nucleus. This analysis points 1 to the importance of having multiple outgroups (here both B. taurus and C. elaphus) in 2 phylogenetic analyses of karyotypes. 3 4 Changes in three-dimensional genome structure after karyotype change. Despite the 5 extensive fusions documented above for M. muntjak and M. reevesi, the genomes are locally 6 very similar (98.5% identity in aligned regions and fourfold synonymous substitution rate of 7 1.3%). Our Hi-C chromatin conformation capture data allows us to examine the impact of these 8 rearrangements on local (i.e., within a megabase along the genome) and longer (i.e., 5 megabase) 9 length scales as chromosomal segments become juxtaposed in novel ways. Focusing first on the 1 0 M. muntjak and M. reevesi lineage-specific fusion sites (Tables S4-7), we note the maintenance 1 1 of distinct Hi-C boundaries in several examples, such as the junction between M. muntjak 1 2 chromosomes X and 3 at 133 Mb on chromosome 3_X. Other fusion sites, however, show no 1 3 notable difference from the rest of the genome in M. muntjak. As expected, M. reevesi shows a 1 4 clear distinction between intra-and inter-chromosomal contacts, including across fusion sites in 1 5 M. muntjak ( Figure 2). To quantify the chromatin changes at these fusion sites, we divided the 1 6 genomes into 1 Mb bins and compared normalized inter-bin Hi-C contact between bins 5 Mb 1 7 apart in the two species, using the M. muntjak assembly as the backbone for comparison ( Figure  1  8   S7). Confirming the initial visual analysis, we found that most bins containing a fusion site have 1 9 fewer long-range chromatin contacts in M. reevesi (averaging 0.16 ± 0.09 normalized contacts 2 0 per bin) compared with M. muntjak (averaging 0.62 ± 0.35 normalized contacts per bin), though 2 1 we identified bins with few contacts in both species ( Figure S7).

2
In order to test whether differences are present at a more local level, we next compared 1 normalized 1 Mb intra-bin Hi-C contacts between the two species, again using the M. muntjak 2 assembly as the backbone for comparison. We found that most of the chromatin contacts are 3 consistent between the two muntjacs, including all but three of the bins containing fusion sites 4 ( Figures 3A and S6). Several regions, however, show distinctive variation in chromatin contacts 5 between the two species: the X chromosome and two regions on M. muntjak chromosome 1 6 (186-355 and 615-630 Mb). Since our sequenced M. reevesi sample is male [10] while the 7 sequenced M. muntjak sample is female [34], we expect a difference in chromatin contacts on the 8 X chromosome, a finding that is further supported by analysis of copy number across the genome 9 using the 10X Genomics linked-read data ( Figure 3B). From this copy number analysis, we also 1 0 hypothesize that the two regions on M. muntjak chromosome 1 (186-355 and 615-630 Mb) are a 1 1 haplotype-specific duplication and a haplotype-specific deletion, respectively, which would 1 2 explain the difference in chromatin signal between the two muntjacs ( Figure 3C-D). Although 1 3 the inter-bin analysis identified long-range chromatin changes between sites 5 Mb apart, our 1 4 quantitative comparison of 1 Mb intra-bin chromatin contacts found substantial chromatin 1 5 conservation between the genome assemblies, including nearly all of the fusion sites. This 1 6 conclusion is further supported by intra-bin analysis with 100 kb bins ( Figure S8). 1 7 1 8 On a multi-megabase length scale, mammalian chromosomes can be subdivided into alternating 1 9 A/B compartments based on intra-chromosome contacts; these compartments correspond to open 2 0 and closed chromatin, respectively, and differ in gene density and GC content [35]. To test 2 1 whether these compartments are conserved or disrupted by fusions, we computed the A/B 2 2 chromatin compartment structures for M. muntjak and M. reevesi from the Hi-C data, again using 2 3 the M. muntjak assembly as the backbone for comparison. We found that, in general, 1 compartment boundaries are not well conserved between the muntjacs ( Figure S9). Specifically, 2 for A/B compartments larger than 3 Mb (i.e., containing more than three 1 Mb bins), only 17 3 compartments were completely conserved between the two species, out of 221 A/B 4 compartments analyzed in M. muntjak and 161 in M. reevesi. We found that many of the 5 compartments in M. reevesi are subdivided into multiple compartments in M. muntjak. 6 Combining our analysis of A/B compartments and the chromatin contacts, we found that the 7 extensive set of fusions in the M. muntjac lineage altered three-dimensional genome structure at 8 the multi-megabase scale while still maintaining conservation at the local level. These large-9 scale chromatin changes accompanying karyotype change must have only limited effects on the 1 0 underlying gene expression, since the two muntjac species can produce sterile hybrid offspring 1 1 [36]. Similar uncoupling between genome topology and gene expression has been observed in 1 2 Drosophila melanogaster [37]. 1 3 1 4 Genic evolution accompanying rapid karyotype change. Finally, we searched for genic 1 5 differences between muntjac that may have accompanied rapid karyotype evolution. These 1 6 could, for example, be mutations that led to dysfunctional chromosome maintenance and thus 1 7 triggered the rapid occurrence of multiple fusions, such as by destabilization of telomeres. More 1 8 subtly, these genic changes could have occurred as a response to chromosomal change; for 1 9 example, the dramatic reduction in the number of telomeres following large-scale fusion could 2 0 be permissive for mutations that make telomere maintenance less efficient. Our survey of gene 2 1 and gene family differences between muntjacs were suggestive but ultimately inconclusive. In 2 2 particular, we found evidence for positive selection of centromere-associated proteins CENPQ 2 3 and CENPV and meiotic double strand break protein MEI4 as well as the expansion of the 1 nucleosome-binding domain-containing HMG14 family in M. muntjak. 2 3 4

6
We present here new chromosome-scale assemblies of two muntjac deer that differ dramatically 7 in karyotype, despite only limited sequence change, after ~4.9 million years of divergence. 8 Analysis of these new assemblies revealed multiple changes in the underlying chromosome 9 structure, including variation in the A/B compartments despite maintenance of local (i.e., sub-1 0 megabase) three-dimensional genome contacts. One of the chromosome fusions reverses an 1 1 earlier chromosome fission to the resolution of our assemblies, with the two events being 1 2 separated by more than eight million years. Several chromosome maintenance associated 1 3 proteins show accelerated evolution in M. muntjak, although functional studies will be required 1 4 to determine any possible causal link to rapid karyotype change. Future studies will use these 1 5 assemblies to resolve the nature of the fusion sites and to better understand the biological 1 6 mechanisms related to chromosome fissions and fusions in muntjac. value of 1e-5 and inflation value of 1.5. One-to-one orthologous muntjac genes were extracted 1 7 from the pairwise OrthoVenn output, and Yang-Nielsen synonymous and nonsynonymous 1 8 substitution rates were calculated with the Ks calculation script (commit 78dda1e; 1 9 https://github.com/tanghaibao/bio-pipeline/tree/master/synonymous_calculation) Phylogeny. From the one-to-one orthologous genes of all five species identified by OrthoVenn, 2 1 codons with potential four-fold degeneracy were extracted from the B. taurus Ensembl release 94 2 2 Sep 2011 genebuild, excluding codons spanning introns, using custom script 4Dextract.py (v1.0).

3
Using the ROAST-merged MAF file with B. taurus as reference, the corresponding codons were 1 identified in the other four species, checking for corresponding amino acid conservation and 2 excluding any codons that span two alignment blocks in the MAF file. The output fasta file 3 containing four-fold degenerate bases was converted in phylip format with BeforePhylo (commit 4 0885849; https://github.com/qiyunzhu/BeforePhylo) and then analyzed with RAxML (v8.2.11) 5 [66] using the GTR+Gamma model of substitution with outgroup B. taurus. As previously The tree denotes the ancestral karyotype at each node and the six branches with fission and 4 fusion events relative to the ancestral karyotype. The lack of fissions or fusions on the R.  (Table S6), chromosome ends, the X chromosome, the potential M. muntjak 2 1 haplotype-specific duplication, and the potential M. muntjak haplotype-specific deletion colored.

2
The expected result of conserved Hi-C contacts is represented with a dashed red line. For fusion 2 3 site ranges spanning two bins, the bin containing the majority of the fusion site range was 1 deemed to be the fusion site bin. Copy number was calculated from normalized coverage of 2 adapter-trimmed 10X Genomics linked-reads for three regions with variation in the chromatin 3 reevesi (x axis) with the inter-bin contacts that span across but do not include the M. muntjak 2 0 lineage-specific fusion sites (Table S6) (Table S6) colored black. The 2 expected result of conserved Hi-C contacts is represented with a dashed red line. For fusion site 3 ranges spanning two bins, the bin containing the majority of the fusion site range was deemed to 4 be the fusion site bin. For fusion site ranges spanning three or more bins, the middle 100 kb 5 bin(s) was deemed to be the fusion site bin(s).   58,941,978,602 57,645,778,258 57,645,746,127 BTA2 93,282,424,724 79,668,719,766 79,668,719,765 BTA5 70,623,699,763 57,880,822,584 57,880,196,482 BTA6 63,301,370,450 Fusion reversal 68,435,455,313 BTA8 64,071,114,095 67,266,499,444 67,369,497,566 BTA9 63,670,013,115 64,824,087,945 64,824,087,945 1  : 103,142,201,151 Chr1: 103,650,852,521 5prox/22 Chr1: 267,859,926,350 Chr4: 52,627,781,577 2dist/11 Chr1: 1,006,153,006,638,886 Chr3: 100,711,302,993 18/25 Chr2: 216,351,390,051 Chr2: 210,159,200,937 25/26 Chr2: 256,956,281,016 Chr2: 169,793,063,149 26_28 (B. taurus fission) Chr2: 305,072,072,202 Chr2: 121,918,082,397 27/8dist Chr2: 580,274,942,905 Chr9: 40,769,560,918