Chromosomal inversion polymorphisms shape the genomic landscape of deer mice

Harringmeyer, Olivia S.; Hoekstra, Hopi E.

doi:10.1038/s41559-022-01890-0

Download PDF

Article
Open access
Published: 17 October 2022

Chromosomal inversion polymorphisms shape the genomic landscape of deer mice

Nature Ecology & Evolution volume 6, pages 1965–1979 (2022)Cite this article

11k Accesses
14 Citations
50 Altmetric
Metrics details

Subjects

Abstract

Chromosomal inversions are an important form of structural variation that can affect recombination, chromosome structure and fitness. However, because inversions can be challenging to detect, the prevalence and hence the significance of inversions segregating within species remains largely unknown, especially in natural populations of mammals. Here, by combining population-genomic and long-read sequencing analyses in a single, widespread species of deer mouse (Peromyscus maniculatus), we identified 21 polymorphic inversions that are large (1.5–43.8 Mb) and cause near-complete suppression of recombination when heterozygous (0–0.03 cM Mb⁻¹). We found that inversion breakpoints frequently occur in centromeric and telomeric regions and are often flanked by long inverted repeats (0.5–50 kb), suggesting that they probably arose via ectopic recombination. By genotyping inversions in populations across the species’ range, we found that the inversions are often widespread and do not harbour deleterious mutational loads, and many are likely to be maintained as polymorphisms by divergent selection. Comparisons of forest and prairie ecotypes of deer mice revealed 13 inversions that contribute to differentiation between populations, of which five exhibit significant associations with traits implicated in local adaptation. Taken together, these results show that inversion polymorphisms have a significant impact on recombination, genome structure and genetic diversity in deer mice and likely facilitate local adaptation across the widespread range of this species.

Hybrid speciation driven by multilocus introgression of ecological traits

Article Open access 17 April 2024

A pan-genome of 69 Arabidopsis thaliana accessions reveals a conserved genome structure throughout the global species range

Article Open access 11 April 2024

Complexity of avian evolution revealed by family-level genomes

Article 01 April 2024

Main

A longstanding goal in population genetics has been to quantify intraspecific genetic variation, which serves as the substrate for evolutionary change. Since Lewontin and Hubby first characterized protein sequence variation in Drosophila pseudoobscura in 1966, tremendous progress has been made in measuring levels of single nucleotide polymorphisms (SNPs) in a wide diversity of species¹. However, the prevalence of structural genomic variation, a focus of cytogenetics, remains largely uncharacterized in the molecular era². Chromosomal inversions, in particular, are an important form of structural variation: inversions can be large (affecting megabases of sequence)³ and have been implicated in local adaptation, including differentiation of annual and perennial ecotypes of monkeyflowers⁴, wing-pattern morphs of mimetic butterflies⁵ and mating types of ruffs^6,7.

Inversions may play a key role in local adaptation because of their effects on recombination. When heterozygous, an inversion will suppress recombination with the noninverted arrangement and, as a result, can drastically increase linkage disequilibrium (LD) between the loci it carries. As such, inversions can act as ‘supergenes’⁸, linking multiple locally adaptive alleles together into coinherited haplotype blocks, which may be advantageous in the face of gene flow^9,10,11. Although inversions have been identified across a diversity of species in the context of local adaptation, suggesting that beneficial inversions may be common³, few studies have performed unbiased scans across the genome for inversion polymorphisms (but see refs. ^{12,13,14,15,16}), raising the question of whether adaptive inversions are the exception or the rule. Thus, characterizing the abundance of inversion polymorphisms—that is, inversions segregating within a species—is a critical step towards quantifying levels of intraspecific genetic variation and understanding how and why inversion polymorphisms are established and maintained.

Detecting inversion polymorphisms with molecular data has traditionally been challenging (for example, breakpoints often reside in highly repetitive regions)¹⁷, but recent advances in long-read sequencing and increased feasibility of population-level genome resequencing provide new, powerful approaches for identifying inversions^18,19. Using these approaches, recent studies have revealed the abundance of inversion polymorphisms in a few species: for example, sunflowers harbour dozens of large (1–100 Mb) inversion polymorphisms¹⁶, and humans harbour, on average, hundreds of inversion polymorphisms that affect more DNA base pairs (bp) in total than SNPs^12,13.

Here, we perform an unbiased genome-wide scan for inversion polymorphisms in the deer mouse, Peromyscus maniculatus. The deer mouse is the most abundant and widespread mammal in North America: it has large effective population sizes^20,21 and a range spanning all major terrestrial habitats, including dense forests and open prairies²². Early cytogenetic work in deer mice identified at least 13 visible chromosomal rearrangements^23,24. Returning to this system in the molecular age, we detect 21 large inversion polymorphisms segregating within deer mice (some of which are likely to overlap with rearrangements detected by cytogenetics). In localizing these inversions, we determine their positions relative to centromeres and telomeres, explore their effects on chromosome structure, characterize genomic content at their breakpoints and propose a mechanism by which inversions arise in this species. Further, we quantify the impact of the inversions on recombination and the resulting effects on mutational load. Finally, we survey the distributions of the inversions across the species range and identify several inversions that contribute to local adaptation. Taken together, these results reveal proximate and ultimate mechanisms involved in the establishment and maintenance of inversion polymorphisms and suggest a prominent role for these inversions in local adaptation.

Results

Identifying inversion polymorphisms

To identify putative inversion polymorphisms, we initially focused on five populations—four deer mouse (P. maniculatus) and one oldfield mouse (Peromyscus polionotus), which is nested within the P. maniculatus clade (Fig. 1a)—and performed whole-genome resequencing (15× coverage with Illumina short-read data) on 15 individuals per population. To identify patterns of genetic variation consistent with inversion polymorphisms, we first characterized local population structure within populations and between population pairs in 100 kb windows across the genome using local principal component analyses (PCA)²⁵ and identified outlier regions (Fig. 1b and Extended Data Fig. 1; as described in refs. ^16,26). We then focused on genomic regions for which the first principal component separated individuals into three clusters, probably representing the three possible inversion genotypes (Fig. 1c and Extended Data Fig. 1), with the central cluster having the highest heterozygosity, consistent with inversion heterozygotes (Fig. 1d and Extended Data Fig. 1).

To verify that these genomic patterns were driven by suppression of recombination between haplotypes, we next measured linkage disequilibrium (LD) and recombination rates. In wild-caught mice, LD across all genotypes (but not within homozygotes) was elevated within predicted inversion regions (Fig. 1e and Extended Data Fig. 1), suggesting that recombination is suppressed between but not within haplotypes. We also estimated recombination rates using laboratory-raised inversion heterozygotes and found that putative inversion regions showed nearly complete suppression of recombination in heterozygotes (mean recombination per inversion: 0–0.03 cM Mb⁻¹; Fig. 1f and Extended Data Fig. 1). Together, these results suggest that suppression of recombination is specifically driven by heterozygotes, providing strong evidence that inversion polymorphisms occur in the identified regions. In total, using this approach, we identified 21 inversion polymorphisms in this species. This is a conservative estimate because our approach was limited to identifying inversions >1 Mb in length with a minimum allele frequency of ~10%.

Owing to their number and sizes, these inversions alone affect recombination rates on a massive scale. The detected inversions range in size from 1.5 to 43.8 Mb and, in total, span 17.5% of the deer mouse genome. These inversions cause a near-complete suppression of recombination in heterozygotes: inversion regions show an average recombination rate of only 0.01 cM Mb⁻¹, compared with a genome-wide rate (excluding inversion regions) of 0.80 cM Mb⁻¹ (Extended Data Fig. 2). We also found no significant correlation between inversion size and recombination rate, highlighting how even the largest inversions almost completely suppress recombination (Extended Data Fig. 2). As a consequence, inversions can trap existing mutations or accumulate new mutations and maintain them in LD. Indeed, we found that genetic differentiation (F_ST) between inversion and standard haplotypes was elevated in a block-like structure (Fig. 1g and Extended Data Fig. 1), suggesting that the inversions partition genetic variation into large haploblocks, shaping patterns of genetic diversity across the deer mouse genome.

Inversion breakpoints

To localize inversion breakpoints, we performed PacBio long-read sequencing for one individual from each of the five focal populations and created de novo genome assemblies at the contig level (Extended Data Table 1). By aligning the de novo genome assemblies to the deer mouse reference genome (NCBI accession: GCA_003704035.3), we identified breakpoints for 13 of the 21 inversions (Fig. 2a and Extended Data Fig. 3). The eight inversions for which we did not identify breakpoints included five inversions (inv6.0, inv7.0, inv7.1, inv19.0, inv21.0) not represented in homozygotes among the PacBio-sequenced individuals (Extended Data Table 2); repetitive sequences probably prevented assembly across breakpoints for the remaining three inversions (inv10.0, inv11.0, inv15.1). Using the de novo genome assemblies, we predicted unique centromere locations for 21 of the 23 autosomes using a 344 bp satellite sequence that localizes to deer mouse centromeres²⁷. Although centromeres are notoriously difficult to assemble²⁸, the de novo genome assemblies spanned multiple predicted centromeres, revealing the highly repetitive nature of centromeric regions, with satellite sequence repeats spanning as much as 1.1 Mb (Fig. 2b). Together these data allowed us to precisely map many of the inversions to chromosomes and their positions relative to centromeres.

**Fig. 2: Genome-wide map of inversions.**

We found that the distribution of the inversion polymorphisms across the genome is nonrandom. Of the 21 inversions, 15 are terminal, where the inversion ends within 1.5 Mb of the end of the chromosome (Fig. 2c). In addition, nine inversions have breakpoints (predicted or identified) within 1 Mb of the centromere (Fig. 2c); as predicted centromeres localize within the three inversions with identified breakpoints (inv13.0, inv14.0, inv20.0) and the other six inversions (inv6.0, inv7.0, inv7.1, inv10.0, inv15.1, inv19.0) are terminal and occur on acrocentric chromosomes, these inversions are likely to be pericentric (contain the centromere). As such, these nine inversions may toggle chromosomes between acrocentric and metacentric states, shifting centromere locations by as much as 43 Mb. In addition, these results suggest that centromeric and telomeric regions are likely to harbour inversion breakpoints in deer mice.

We also identified multiple genomic regions with recurrent inversion breakpoints. For example, on chromosome 7, we detected two overlapping inversions (inv7.2, inv7.3) with nearly identical breakpoints localizing only 80.2 kb apart (Fig. 2c, inset). Using whole-genome alignments between P. maniculatus and Peromyscus californicus, an outgroup, we determined the ancestral versus derived orientation for these two inversions and found that they arose independently rather than as a series of nested inversions. We also identified two inversions on chromosome 15 (inv15.1, inv15.2) with a shared breakpoint (although we localized breakpoints for only one of these inversions with the de novo assemblies) and two additional inversions on chromosome 7 (inv7.0, inv7.1) with breakpoints both occurring near the telomere (although we were unable to localize breakpoints for either) (Fig. 2c). The recurrence of inversion breakpoints further suggests that certain genomic regions have a greater tendency to participate in the formation of chromosomal rearrangements.

Characterizing the nature of inversion breakpoint regions is critical to understanding how inversions arise and why some genomic regions may be more susceptible to breakpoints. There are two major mechanisms by which inversions form: (1) nonhomologous end joining (NHEJ) can create inversions if double-stranded breaks occur and the sequence is reintegrated in reverse orientation; and (2) nonallelic homologous recombination (NAHR) can yield inversions if intrachromosomal crossing over occurs between inverted repeats (Fig. 3a). For 12 of the 13 inversions with localized breakpoints, we identified at least one pair of inverted repeats flanking the inversion (Fig. 3b). These inverted repeats ranged from 500 bp to 50 kb in length (Fig. 3b) and were often duplicated near the breakpoints (Fig. 3c and Extended Data Fig. 4). This suggests that the vast majority of inversions for which we identified breakpoints likely arose owing to NAHR, as opposed to NHEJ, consistent with the formation of inversions in humans²⁹.

We next explored whether the breakpoints were enriched in repetitive genomic regions. For the 20 localized inversion breakpoints (excluding six breakpoints at chromosome ends), we used SEDEF³⁰ to identify segmental duplications (SDs), defined as duplicated sequence within 500 kb of the breakpoint that is >1 kb in length and contains <70% common repeats. We found that breakpoint regions were significantly enriched for SDs compared with randomized regions genome-wide (Kolmogorov–Smirnov test: P < 0.001); for example, 50% of breakpoints had SD density in the top 90th percentile of random regions genome-wide (Fig. 3d). The repetitive structure of the breakpoints varied, with some breakpoint regions harbouring highly structured SDs in tandem (Fig. 3e and Extended Data Fig. 4) and others harbouring multiple interspersed SDs (Extended Data Fig. 4). Together, these analyses show that genomic regions with an accumulation of SDs may be prone to chromosomal rearrangements via ectopic recombination in deer mice.

Frequencies and evolution of inversions

To explore the distributions of these inversions, we next characterized their frequencies across the species range. We first determined the derived inversion arrangement based on genome alignments with an outgroup, P. californicus, and then genotyped the inversions in 218 mice from 13 populations (Fig. 4a). Most inversions were found in multiple populations: 18 of the 21 inversions were present in at least three of the 13 sampled populations (Fig. 4b). However, the varying distributions of the inversions suggest that they have differing evolutionary histories (for example, inversion age and selection): some inversions (for example, inv14.0) are widespread, whereas others (for example, inv7.2) are spatially constrained (Fig. 4c and Extended Data Fig. 5). The highly polymorphic nature of many of the inversions (for example, inv21.0) (Fig. 4c and Extended Data Fig. 5) was particularly striking, with 16 of 21 inversions segregating in at least two of the sampled populations (Fig. 4b). As such, inversion heterozygotes are common (Fig. 4b), indicating that the inversions have a profound impact on recombination rates in the wild.

**Fig. 4: Distributions and frequencies of inversions.**

Limited evidence for deleterious effects of inversions

To explore any negative consequences of inversions on fitness, we first examined possible deleterious effects due to inversion breakpoints. If an inversion breakpoint occurs within or near a gene, it may substantially affect the function and/or expression of that gene³¹. We found that significantly fewer inversion breakpoints occurred within protein-coding genes than expected based on the deer mouse gene density (binomial test: P = 0.004): of the 13 inversions for which we localized breakpoints, only two inversions (inv9.1, inv18.0) had breakpoints occurring within a protein-coding gene (inv9.1 disrupts the 1700129C05Rik intron, inv18.0 disrupts the Slc39a5 coding sequence (left breakpoint) and Baz2a intron (right breakpoint)) (Fig. 5a). Whereas these two inversions may affect phenotypes through disrupting gene function, the other 11 inversions with localized breakpoints do not disrupt annotated genes (Fig. 5a) and are thus less likely to convey strongly deleterious effects, although their breakpoints may still influence gene expression.

We next characterized possible mutational loads carried by the inversions, which may accumulate owing to suppressed recombination in inversion heterozygotes³². To do so, we tested whether the inversions were enriched for nonsynonymous mutations relative to the standard haplotypes. We found that the inversions did not show a significant increase in their proportion of segregating nonsynonymous to synonymous mutations (pN/pS) compared with the standard haplotypes (two-sided t-test: P > 0.05 for all inversions), nor did they show a significant increase in nucleotide diversity at nonsynonymous versus synonymous sites (π_N/π_S) compared with the standard haplotypes (two-sided t-test: P > 0.05 for all inversions) (Fig. 5b). In addition, neither the inversions nor the standard haplotypes showed enrichment for nonsynonymous mutations (pN/pS and π_N/π_S) relative to the rest of the genome (one-sided t-test: P > 0.05 for all inversions and standard haplotypes) (Fig. 5b), and we did not find a correlation between inversion heterozygote frequency and mutational load (Extended Data Fig. 6). Using nonsynonymous mutation accumulation as an estimate of mutational load, these results suggest that the inversions do not harbour a strong deleterious mutational load.

In addition, if inversions accumulate a recessive mutational load, inversion homozygotes should be rare (for example, in butterflies³³ and sparrows³⁴). In deer mice, however, inversion genotype frequencies are consistent with Hardy–Weinberg equilibrium (HWE): we found only five (of 73) instances in which a segregating inversion significantly deviated from HWE within a population (Fig. 4b). This suggests that inversion homozygotes are not strongly underrepresented relative to expectation in populations segregating for a given inversion, which further supports the observation of limited mutational load. We also note that since most inversion genotype frequencies are consistent with random mating, strong assortative or disassortative mating by inversion genotype does not readily occur (unlike in the ruff^6,7 or white-throated sparrow³⁴). Together, these lines of evidence suggest that these inversions in deer mice are not associated with strongly negative effects on fitness.

Multiple inversions contribute to local adaptation

To explore the role of positive selection in the establishment and maintenance of these inversion polymorphisms, we characterized the contribution of inversions to local population differentiation. We took advantage of previous work on two populations, representing forest and prairie deer mouse ecotypes (populations c and e, Fig. 4a), which are well characterized and widespread²⁰. Forest and prairie mice show many pronounced phenotypic differences (for example, coat colour, tail length, foot length) despite ongoing gene flow. We previously identified an inversion on chromosome 15 (inv15.0) that contributes to phenotypic divergence between these ecotypes²⁰. Returning to this system, we found that multiple newly identified inversions were also major contributors to differentiation between these populations. Specifically, genome-wide F_ST is low between ecotypes (genome-wide forest–prairie F_ST: 0.03 ± 0.03) owing to high migration rates²⁰, yet we found multiple ‘genomic islands of divergence’ that showed remarkable overlap with identified inversion polymorphisms (inversion-region forest–prairie F_ST: 0.26 ± 0.16) (Fig. 6a). For 13 inversions, the ecotypes differed by >50% in their inversion frequencies. Using forward-genetic simulations in SLiM³⁵, we found that for a locus to be maintained at >50% frequency difference between the forest and prairie ecotypes given high gene flow, it was most likely to be evolving under divergent selection (Extended Data Fig. 7), implicating these 13 inversions in local adaptation.

**Fig. 6: Inversions involved in local adaptation.**

The distributions of these inversions across a forest–prairie habitat gradient further support their role in adaptation. Specifically, we genotyped the 13 polymorphic inversions in 136 samples across an environmental gradient and found that nine inversions showed steep changes in frequency across the forest–prairie habitat transition (Fig. 6b and Extended Data Fig. 8), suggesting that these inversions may be favoured in alternate habitats. Furthermore, five inversions (inv7.2, inv14.0, inv15.0, inv18.0, inv21.0) were significantly associated with an ecotype-defining trait, tail length, in laboratory-raised F₂ hybrids²⁰ (P < 0.05, linear model) and, for all five, the forest arrangement was associated with longer tails (Fig. 6c), consistent with long tails being important for balance in arboreal habitats³⁶. These five inversions together explain 23.0% of the variance in tail length (individually explaining 2.0–12.5% of the variance, with additive effects ranging from 1.1–2.7 mm change in tail length). Inv15.0 has also been previously found to be significantly associated with coat colour, a second ecotype-defining trait²⁰ (explaining 40% of coat colour variance) (Fig. 6c). Together, these results suggest that inversions may be a key source of genetic variation differentiating locally adapted deer mouse populations, with divergent selection likely to play a role in maintaining the inversions as polymorphisms within this species.

Discussion

Technological advances in genome sequencing have recently led to new opportunities for characterizing intraspecific structural variation. For example, the ability to perform population-level whole-genome resequencing allows signatures of large structural variants such as chromosomal inversions to be more easily detected¹⁹. This approach has recently been successful in identifying inversions in sunflowers^16,26 and seaweed flies¹⁵ and now in deer mice. In addition, long-read sequencing has also greatly facilitated the detection and classification of structural variants. For example, here we found that inversion breakpoints reside in highly repetitive genomic regions, harbouring an enrichment of segmental duplications, similar to other mammalian species (that is, humans and great apes^13,17). The repetitive nature of mammalian inversion breakpoints probably explains why breakpoints are so challenging to detect with short-read sequencing data alone, as well as with long-read data if read length or coverage is insufficient to resolve repeat regions, as we suspect is the case for the deer mouse inversions for which we failed to localize breakpoints. Future work combining these two approaches—to perform population-level long-read genome sequencing—will further our ability to detect structural variation within a diversity of species¹⁸.

In discovering deer mouse inversion polymorphisms, we found that they have an interesting distribution in the genome: a majority of the inversions occur terminally, and most of these involve breakpoints near centromeres. The inversions with breakpoints adjacent to centromeres are likely to be shifting centromere locations from the middle of the chromosome to the end of the chromosome (and vice versa), transforming chromosomes between metacentric and acrocentric states. This result could explain the longstanding observation that deer mice vary in number of acrocentric chromosomes^23,24. Furthermore, inversions are also likely to influence chromosome accessibility owing to changes in the three-dimensional genome structure, which, in addition to the mutations the inversions carry, may influence the expression of genes found within the inversions. Despite this large variation in chromosome structure, deer mice (and, more generally, the Peromyscus genus) have a strongly conserved chromosome number (diploid n = 48)²⁴. Unlike the case in other rodents such as the house mouse, which harbours Robertsonian fusions³⁷, the large rearrangements involving centromeres occur primarily within and not between chromosomes in deer mice.

One hypothesis for why deer mouse inversions tend to involve telomeric and centromeric regions is that inversion breakpoints arise more frequently in these regions: genomic regions near centromeres and telomeres can harbour an excess of SDs (as well as other repeats), which may facilitate ectopic recombination³⁸. A second hypothesis is that inversions with breakpoints in telomeric or centromeric regions are less likely to be removed by purifying selection than inversions that occur in other genomic regions: breakpoints that occur near centromeres and telomeres may be unlikely to have deleterious effects as these regions tend to be gene-sparse³⁸. Indeed, none of the inversion breakpoints we found near centromeres (and only one near a telomere) disrupted protein-coding sequences. Terminal inversions may also be less likely than non-terminal inversions to have strong underdominant effects, which often occur owing to inversion loops that form in heterozygotes during meiosis³. If an inversion lacks homologous sequence on one side, such as in a terminal inversion, loop formation may be prevented. Previous evidence from deer mice suggests that inversion loop formation is rare in putative terminal inversions³⁹. Thus, deer mouse inversions involving telomeres and centromeres may confer fewer deleterious costs associated with breakpoint effects and underdominance than inversions occurring in the rest of the genome.

Inversions are a particularly interesting form of structural variation because of their effects on recombination. Inversions in deer mice, when heterozygous, suppress recombination across their entire lengths. The number and sizes of the inversions thus seem striking in the context of recombination: 21 detected inversion polymorphisms, with a mean length of 20.0 Mb, affect a total of 420 million DNA bp (or 17%) of the deer mouse genome. Although these results are consistent with large inversions causing suppression of recombination in other species (for example, quails⁴⁰, maize⁴¹ and cod⁴²), whether inversion polymorphisms affect similar proportions of the genome in other species remains largely unknown. Furthermore, as our approach was limited to detecting inversions >1 Mb in length, there are possibly many additional inversions of shorter lengths segregating within deer mice, which is an important direction for future work. Nevertheless, we found that the detected inversions substantially shape the recombination landscape of deer mice: although suppression of recombination is limited to inversion heterozygotes (so the frequency of an inversion will determine the extent to which it affects recombination), most deer mouse inversions are widespread and inversion heterozygotes are common in natural populations.

Recombination plays an important role in evolution through creating new combinations of alleles and increasing the efficiency of natural selection⁴³. In particular, through uncoupling deleterious and beneficial mutations, recombination reduces Hill–Robertson interference and facilitates the elimination of deleterious mutations and the spread of beneficial mutations^44,45. Given the benefits of recombination, the abundance of inversions presents a paradox. With reduced efficacy of purifying selection in the absence of recombination, the expectation is that inversions will accumulate a deleterious mutational load (when inversion heterozygotes are common)³², which will limit their spread⁴⁶. In deer mouse inversions, however, we did not find evidence for the accumulation of mutational load based on nonsynonymous mutations (although these inversions may harbour an excess of other types of deleterious variants such as transposable elements, which future work will further resolve), consistent with a recent study in sunflowers⁴⁷. In both deer mice and sunflowers, inversion homozygotes are common⁴⁷; as recombination proceeds uninterrupted in inversion homozygotes, deleterious mutations can efficiently be removed once an inversion reaches substantial allele frequency³², especially if effective population sizes (N_e) are high, as in many populations of deer mice (for example, N_e ≈ 4 × 10⁶ in a single population²⁰). As in sunflowers⁴⁷, we hypothesize that these inversions, which act as large-scale modifiers of recombination when heterozygous, largely evaded deleterious costs associated with suppressed recombination by quickly spreading to high frequencies in deer mice, whose large population sizes could facilitate effective purifying selection in inversion homozygotes³² (noting that gene conversion between inversion and standard haplotypes may also have a role in reducing deleterious mutational load³²).

A major hypothesis for the maintenance of inversion polymorphisms is the ‘local adaptation hypothesis’, which posits that when a population is locally adapting in the face of gene flow, suppressed recombination between multiple beneficial mutations can be advantageous, reducing the strength of selection necessary to establish and/or maintain each mutation in migration–selection equilibrium^9,10,11. As deer mice are found continuously across a wide range of habitats, they are subjected to a range of selective pressures, probably with ongoing gene flow. Our results support an important role for divergent selection in maintaining inversions as polymorphisms within the species at large. In particular, we found that 13 inversions, including one previously identified²⁰, are segregating between forest and prairie deer mouse ecotypes with high allele frequency differences and are likely to be subject to habitat-associated divergent selection, consistent with multiple inversions differentiating ecotypes in a diversity of species such as snails⁴⁸, cod⁴², sunflowers²⁶ and sticklebacks⁴⁹. Although it remains an open question whether the inversions segregating between these forest–prairie ecotypes are advantageous because of their suppression of recombination, the high levels of migration between the forest and prairie populations suggest that increased LD between adaptive alleles may be particularly beneficial in this system²⁰. In addition, five of these inversions have significant effects on tail length, and thus variation in this ecotype-specific trait is largely partitioned into inversions, consistent with the evolution of concentrated genetic architectures in the face of gene flow⁵⁰.

A concrete understanding of the prevalence and significance of inversion polymorphisms specifically, and of structural variation more generally, remains largely elusive across natural populations of organisms, especially mammals⁵¹. We find that inversion polymorphisms are abundant in deer mice. Whether the abundance of inversion polymorphisms in deer mice is unique or representative of mammalian species will require similar investigations across additional species. Nevertheless, this work highlights the critical role of inversions in shaping patterns of recombination, genetic diversity and chromosomal structure in the deer mouse and suggests that inversions may play an even more important part in local adaptation than previously appreciated.

Methods

Population sampling and sequencing

Focal population sampling

We focused our initial analyses on five populations of P. maniculatus, each representing a distinct subspecies (P. m. rubidus, P. m. gambelii, P. m. bairdii, P. m. nubiterrae and P. p. subgriseus). Tissues from 15–17 wild-caught mice per population were collected in Siuslaw National Forest, Oregon, USA (P. m. rubidus)²⁰, Baker City, Oregon, USA (P. m. gambelii)²⁰, Derry, Pennsylvania, USA (P. m. nubiterrae)⁵², Ocala National Forest, Florida, USA (P. p. subgriseus)⁵³ and Bridgewater, Michigan, USA (P. m. bairdii; obtained from the University of Michigan). All samples used in this study are listed in Supplementary Table 1.

Whole-genome resequencing and variant calling

To generate whole-genome resequencing data, we first extracted DNA from ~20 mg of liver tissue and generated sequencing libraries using Illumina DNA library preparation kits. We sequenced the resulting libraries using 150 bp paired-end sequencing on an Illumina NovaSeq S4 flow cell to obtain ~15× coverage per sample. Following demultiplexing, we mapped sequencing reads to the P. maniculatus bairdii reference genome (NCBI accession: GCA_003704035.3) using BWA-MEM. We accessed published whole-genome resequencing data for three populations: P. m. rubidus, P. m. gambelii²⁰ (NCBI: PRJNA688305) and P. p. subgriseus⁵³ (PRJNA838595). To call variant sites, we used HaplotypeCaller (GATK3.8) on each sample with the default heterozygosity prior (−hets = 0.001) and –ERC GVCF to produce per-sample genomic variant call format files (vcfs). Then, we ran GenotypeGVCFs (GATK3.8) to jointly genotype the samples. We performed hard filtering of SNPs based on GATK best practices (filtering variants with quality by depth (QD) < 2.0, FisherStrand (FS) > 60.0, mapping quality (MQ) < 40.0, MQRankSum < −12.5, ReadPosRankSum < −8.0) using VariantFiltration.

Identifying inversions

Local PCA

To identify genomic regions with outlier population structure, we performed local PCA with the lostruct package²⁵ in R on each of the five focal populations and for all focal population pairs. Note that when all populations are included, population structure is driven by population divergence, which masks the signatures of possible inversions. Therefore, we included only individual populations or population pairs for this analysis, such that inversion signatures were detectable. Using lostruct, we performed local PCA for 100 kb windows with a step size of 100 kb. We then computed the distance between PCA maps (with the top two PCs) using the pc_dist function with default parameters and visualized these distances using multidimensional scaling (MDS) with the cmdscale function with two MDS axes.

To identify genomic regions with unusual population structure, we scanned for consecutive 100 kb windows that showed similar population structure to each other and distinct population structure from the rest of the chromosome. To do so, we first performed k-means clustering of the 100 kb windows in the MDS space, defined by the MDS1 and MDS2 axes, using numbers of clusters from k = 2 to k = 10. To determine the best k, we chose the k with the maximum silhouette score; this is an averaged measure of the dissimilarity between an observation and its neighbouring cluster. We then assigned 100 kb windows to the cluster determined by the k-means clustering for the chosen k. We next calculated the z score for the MDS1 score for each 100 kb window and selected genomic regions with consecutive windows belonging to the same cluster in which at least ten consecutive windows had z score >1.5.

PCA and heterozygosity

For each identified outlier region, we performed PCA on the entire region using scikit-allel v.1.3.2 (https://github.com/cggh/scikit-allel). For scikit-allel analyses, we created zarr objects from the whole-genome resequenced vcfs using allel.vcf_to_zarr. We then performed PCA using all SNPs in the region, with the function allel.pca, with n_components = 10, scaler = ‘patterson’ and ploidy = 2. k-means clustering of samples in PC1 versus PC2 space was performed in R with kmeans, following the approach detailed by Todesco et al.¹⁶, where samples were assigned to three clusters, setting the cluster starting positions as the minimum, maximum and middle value for PC1 scores to prevent clustering from being influenced by unequal numbers of samples per cluster. When clustering into three groups failed, we tried clustering into two groups, which would be the case if only two inversion genotypes are present. In a few cases (n = 4), we manually reassigned clusters for samples when k-means clustering had clear misassignments. For each outlier region identified, we also computed heterozygosity (reported as the percentage of sites that are heterozygous) for every sample in the relevant populations, using count_het in scikit-allel. Finally, we selected putative inversions to be outlier regions for which samples clustered into three distinct groups along PC1 with high heterozygosity for the middle cluster. We also included an additional four regions for which samples clustered into only two distinct groups along PC1 but signatures of recombination suggested the presence of an inversion (see below).

Linkage disequilibrium

For each putative inversion, we computed LD across the chromosome harbouring that putative inversion using: (1) all samples belonging to the population or population pair from which the putative inversion was identified; and (2) only the samples homozygous for the more common haplotype, based on the PCA clustering. To compute LD, we subset the vcf by sample and chromosome with bcftools. We then used vcftools to filter for SNPs with minor allele frequency (MAF) > 5% (--maf 0.05) and number of missing genotypes = 0 (--max-missing-count 0) and thinned SNPs to at most one SNP per 1 kb (--thin 1000). We computed LD with vcftools geno-r2. Finally, we used the script emerald2windowldcounts.pl (https://github.com/owensgl/reformat, https://github.com/owensgl/haploblocks) to calculate the mean r² between 500 kb windows (that is, for a given set of two 500 kb windows, the mean r² across all pairwise SNP comparisons between the two windows was computed).

Recombination rates

We estimated recombination maps for both the whole genome and within inversion regions, using laboratory-raised F₂ hybrids from previous intercrosses between two population pairs: P. m. rubidus × P. m. gambelli²⁰ and P. m. bairdii × P. p. subgriseus⁵⁴, which yielded a total of 547 and 1061 F₂ hybrids, respectively. Using double-digest restriction-site associated DNA (ddRAD) sequencing data of F₂ hybrids, we determined ancestry and the location of recombination breakpoints in the F₂ hybrids using the multiplexed shotgun genotyping pipeline (see ref. ²⁰ for details). For the P. m. rubidus × P. m. gambelli intercross, we genotyped the founders (n = 4) and F₁ hybrids (n = 49) of the intercross for the inversions (see Genotyping samples for inversions) to ensure that only F₂ hybrids that were offspring of F₁ inversion heterozygotes were used for computing recombination rates within inversion regions. All inversions analyzed in the P. m. bairdii × P. p. subgriseus intercross were fixed between the founders. Five inversions (inv7.0, inv7.3, inv9.1, inv15.2, inv20.0) were not represented by heterozygous F₁ hybrids and so we were unable to characterize recombination rates for these inversions.

Genetic differentiation

To measure genetic differentiation between inversion and standard haplotypes across each identified inversion, we computed F_ST between predicted homozygote genotypes (clusters 1 and 3 from PCA clustering) using scikit-allel. We performed sliding-window F_ST analyses for 10 kb windows with a 10 kb step size using scikit-allel with the windowed_hudson_fst function and visualized F_ST with loess smoothing in R.

To analyze genome-wide genetic differentiation between forest (P. m. rubidus) and prairie (P. m. gambelii) ecotypes, we computed F_ST between forest and prairie populations in 100 kb windows across the genome with a step size of 100 kb, using scikit-allel with the windowed_hudson_fst function.

Localizing inversion breakpoints

PacBio long-read sequencing and de novo genome assembly

We performed long-read sequencing on five individuals (laboratory-colony-raised mice), one from each focal population. First, we extracted high-molecular-weight DNA from 200 μl fresh blood using the MagAttract HMW DNA mini kit (Qiagen), following the Whole Blood protocol. We quantified the resulting DNA using a Genomic DNA ScreenTape on a TapeStation 4200 (Agilent). Library preparation and sequencing were performed at the PacBio Sequencing Core of the University of Washington. In brief, libraries were prepared with the SMRTbell Express Template Prep Kit 2.0 (PacBio). We performed a size selection of 30 kb for the P. m. rubidus, P. m. nubiterrae and P. m. bairdii samples using BluePippin (Sage Science); we did not perform any size selection for the P. m. gambelii and P. p. subgriseus samples as the total library mass was below 500 ng. We then sequenced each on a Sequel II SMRTcell 8 M (PacBio), the P. m. rubidus, P. m. nubiterrae and P. m. bairdii samples with a 15 h video and the P. m. gambelii and P. p. subgriseus samples with a 30 h video.

We converted the bam files from each video to fastq files using bam2fastx (PacBio). We then used flye⁵⁵ to create de novo genome assemblies at the contig level for each population. The flye assembler uses a repeat graph to assemble across repetitive genomic regions, a critical feature for localizing inversion breakpoints, which often occur in repetitive genomic regions. To reduce run time, we downsampled to 40× coverage (-asm-coverage = 40) for initial disjointing assembly but otherwise ran the assembler with default parameters. Genome qualities are reported in Extended Data Table 1.

To genotype each PacBio sample for the inversions, we first mapped the PacBio fastq files to the P. maniculatus reference genome using ngmlr⁵⁶. Then, we used longshot⁵⁷, a long-read-specific variant caller, to call variants for each sample. We merged the variant calls with the whole-genome resequencing vcfs and performed PCA for each inversion region, which allowed us to genotype the PacBio samples for the inversions (Extended Data Table 2; for details, see Genotyping samples for inversions).

Inversion breakpoint identification

We aligned the PacBio genome assemblies to the P. maniculatus bairdii reference genome using nucmer (mummer)⁵⁸ with default parameters. Owing to the possibility of reference genome errors, we reoriented any scaffolds in the reference genome that were misoriented relative to the P. m. bairdii long-read assembly (that is, we identified signatures of inversions or translocations in the P. m. bairdii long-read assembly relative to the reference genome and resolved these regions to match the P. m. bairdii long-read assembly). Thus, all inversion analyses were relative to the P. m. bairdii long-read assembly. We also aligned published P. californicus⁵⁹ (NCBI accession: GCA_007827085.2) and P. leucopus⁶⁰ (NCBI accession: GCA_004664715.2) genomes as well as previously assembled de novo genomes for P. m. rubidus and P. m. gambelii²⁰ from canu (a long-read genome assembler complementary to flye) to the P. maniculatus reference genome using nucmer.

For each inversion, we scanned for evidence of inversion breakpoints. To do so, we filtered for nucmer alignments >4 kb in length (or >10 kb for P. californicus, P. leucopus alignments). Inversion breakpoints are identifiable if: (1) a contig spans the inversion region and maps to the reference genome in opposite orientation within the inversion region; or (2) a contig spans only part of the inversion region and maps to the reference genome in opposite orientation to the flanking region of the other end of the inversion. We thus identified contigs that showed signatures of inversions in predicted inversion regions and identified breakpoint positions based on the PacBio assembly alignments to the P. maniculatus reference genome. In addition, we identified breakpoints for one of the predicted inversions based on the P. leucopus genome alignment to the P. maniculatus reference genome and one of the predicted inversions based on the P. californicus genome alignment to the P. maniculatus reference genome.

Determining derived arrangement

For each inversion polymorphism, we determined which arrangement was ancestral (standard) versus derived (inversion) based on the whole-genome alignments between P. californicus (outgroup) and P. maniculatus. We evaluated whether the P. californicus reference genome was inverted relative to the P. maniculatus reference genome for each inversion region, and we assigned the P. californicus orientation to be the ancestral, standard arrangement.

Predicting centromere locations

Peromyscus species are known to have satellite sequences that map to centromeres; specifically, a 344 bp satellite sequence (NCBI accession: KX555281.1) localizes to P. maniculatus centromeres²⁷. We used blastn (blast v.2.2.29) to map this satellite sequence to the P. maniculatus reference genome and to each PacBio genome assembly (as long-read genome assemblies are more likely to assemble across repetitive regions), filtering for alignments with >85% identity. Using this approach, we then determined centromere locations in the reference genome (converting alignment positions in the PacBio genomes to their corresponding or closest reference genome coordinates). To further explore the predicted centromeres, we created dotplots for contigs from the PacBio genomes that spanned a predicted centromere. To do so, we used nucmer with --maxmatch, -l 50, -c 100 to align each contig to itself and then plotted all alignments >100 bp using R.

Characterizing repeat content at inversion breakpoints

Dotplots

To evaluate whether inversion breakpoints occurred in repetitive regions, we created dotplots from the PacBio contig-level assemblies. We performed self-versus-self nucmer alignments for contigs spanning inversion breakpoints, with --maxmatch, -l 50, -c 100; we filtered for alignments >1 kb and plotted the alignments in R.

Inverted repeats and SDs

We identified inverted repeats and segmental duplications (SDs) near inversion breakpoints using the package SEDEF³⁰. For the relevant PacBio contigs identified above (spanning or adjacent to inversion breakpoints), we softmasked common repeats with RepeatMasker, using --xsmall and --species rodentia and masked the 344 bp centromere satellite sequence. We then performed SEDEF with default parameters on the entire set of relevant PacBio contigs. First, we determined inverted repeats to be any repeat identified by SEDEF that mapped in the opposite orientation to within 500 kb of both inversion breakpoints. Next, we called repeats as SDs if they were duplicated within 500 kb of a breakpoint, were ≥1 kb in length, had ≥70% identity with a duplication and had <70% of their sequence masked as common repeats. We then determined the density of SDs within 500 kb of each inversion breakpoint (note that we excluded breakpoints at chromosome ends as telomeres are not fully assembled in these genome assemblies). To compare the breakpoint SD density to that of random regions genome-wide, we also ran SEDEF on each contig from the P. m. bairdii PacBio genome assembly and called SDs. We then randomly selected 10,000 sites from across the genome and calculated the density of SDs within 500 kb of each site. Finally, we tested whether inversion breakpoints were significantly enriched for SDs relative to the 10,000 randomized genome-wide regions using the Kolmogorov–Smirnov test in R.

Genes near inversion breakpoints

We used the P. m. bairdii genome annotation (Pman2.1_chr_NCBI.corrected.merged-with-Apollo.Aug19.sorted_chr15.gff3) to explore whether inversion breakpoints disrupted annotated protein-coding genes. We tested whether the number of breakpoints disrupting gene sequence was expected by chance based on overall gene density using a binomial test; we calculated the gene density (including exons, introns and untranslated regions) to be 39% genome-wide and then used binom.test in R to perform a binomial test, with probability of success of 0.39.

Inversion frequencies

Sampling populations across species range

To characterize the frequencies and distributions of the inversions across the P. maniculatus range, we included 3–46 individuals from each of an additional eight populations, which, when combined with the initial populations, yielded a total of 218 mice from 13 populations. For five of the new eight populations (populations a, b, f, i and k in Fig. 4a; see Supplementary Table 1 for sample details), we extracted DNA from liver tissue and performed whole-genome resequencing (~10–15× coverage) and variant calling as described above. We also performed whole-genome resequencing for 11 P. leucopus samples and two P. californicus samples (see Supplementary Table 1 for sample details), which we also included in our variant calling pipeline. For three additional populations (populations d, g and h in Fig. 4a; see Supplementary Table 1 for sample details), we obtained publicly available exome-sequencing data⁶¹ (NCBI: PRJNA528923) and mapped sequencing reads to the P. maniculatus reference genome with BWA-MEM. We then performed variant calling as described above, except that these samples were joint-genotyped separately from the whole-genome resequenced samples. We approximated the P. maniculatus range using the IUCN Red List of Threatened Species (https://www.iucnredlist.org/species/16672/22360898) and plotted the range map in R, as shown in Fig. 4a.

Phylogenetic trees

To reconstruct the evolutionary relationships among populations, we used RAxML⁶² to build maximum-likelihood trees. First, we created a tree for the five focal P. maniculatus populations and two outgroups (Peromyscus leucopus and P. californicus). Using hard-filtered SNPs from across the autosomes, we thinned SNPs to at most one SNP per 100 kb using vcftools and merged vcfs across chromosomes. We converted the merged vcf to a PHYLIP matrix using vcf2phylip.py (https://github.com/edgardomortiz/vcf2phylip) and removed invariant sites using ascbias.py (https://github.com/btmartin721/raxml_ascbias), resulting in a total of 12,292 SNPs. We then ran RAxML v.8.2.12 using the ASC_GTRCAT model with the conditional likelihood method, -asc-corr=lewis, to correct for the ascertainment bias due to using SNPs⁶³. We ran 100 bootstraps, with ‘-f a’ to perform rapid bootstrap analysis and visualized trees in iTOL⁶⁴. We next created a tree for all 13 P. maniculatus populations and the two outgroups. To do so, we first merged the variants called for the three exome-sequenced populations with the whole-genome resequenced vcfs and subset each population to at most 15 individuals. We removed variants with missing genotypes for >20% of samples and masked inversion regions using bcftools. We then converted the vcf to a PHYLIP matrix and removed invariant sites as described above, resulting in a total of 15,518 SNPs. We ran RAxML as described above, with 100 bootstraps using ‘-f a’ to perform rapid bootstrap analysis and visualized trees in iTOL.

Genotyping samples for inversions

To genotype individuals for the presence or absence of inversions, we used a PCA approach. For each inversion, we selected closely related populations segregating for the inversion of interest and performed PCA for that inversion region using scikit-allel, as described above. PCA was performed with only a subset of populations to allow for the inversion of interest (rather than population divergence) to drive variance along PC1. We then projected the remaining samples onto the PC1 and PC2 axes. We genotyped samples for each inversion based on loading scores along PC1 (along which samples clustered into inversion genotype groups) with manual determination of boundaries. We verified that samples called as inversion heterozygotes had elevated heterozygosity in the inversion region using the count_het function in scikit-allel. We set any populations with ambiguous clustering along PC1 for a given inversion to have missing genotypes. Finally, we determined inversion genotype frequencies for each population and tested for deviations from HWE using HWE.chisq in R from the genetics package.

We also determined inversion genotypes for: (1) 547 F₂ hybrids from the P. m. rubidus × P. m. gambelii cross; and (2) the 136 wild-caught mice from the environmental transect. To do so, we first created a set of SNPs fixed between the inversion and standard arrangements using homozygous samples from only forest (P. m. rubidus) and prairie (P. m. gambelii) populations, unless there were fewer than three homozygous samples per genotype, in which case we included additional homozygous samples from nearby populations (populations b and f, Fig. 4a) to improve filtering. Previously, the F₂ hybrids were sequenced using the ddRAD-sequencing pipeline (as described in Recombination rates, NCBI: PRJNA687993), and the 136 transect mice were whole-genome resequenced at low coverage (NCBI: PRJNA688305)²⁰. Using these sequencing data, we selected the fixed inversion-standard SNPs from bam files for the F₂ hybrids and transect mice using mpileup and performed the hidden Markov model step of the multiplexed shotgun genotyping pipeline⁶⁵ to determine genotype for each inversion.

Mutational load

To test whether the inversions were enriched for deleterious mutations compared with standard haplotypes, we analyzed the number of segregating nonsynonymous (pN) versus synonymous (pS) sites and nucleotide diversity at nonsynonymous (π_N) versus synonymous (π_S) sites using PopGenome⁶⁶. For each inversion, we selected samples homozygous for the inversion arrangement and used readVCF to import biallelic SNPs for the samples and inversion regions of interest into PopGenome; specifically, we selected homozygous samples from the major P. maniculatus clade (Fig. 4b; populations a, b, c, e, f, i and j) except for inv10.0 and inv11.0, for which we also included populations k, l and m in order to sample both homozygous genotypes. We then used the set.synnonsyn function with the P. m. bairdii genome annotation to determine nonsynonymous and synonymous sites. Next, we computed nucleotide diversity for each synonymous and nonsynonymous site with the diversity.stats function. Finally, for 500 kb windows across each inversion region, we calculated the ratios pN/pS and π_N/π_S (using only sites that were segregating within the homozygous sample set). We then repeated these analyses for samples homozygous for the standard arrangement. To test whether the inversion and standard haplotypes significantly differed in pN/pS or π_N/π_S, we performed two-sided t-tests in R. Inv7.1 was excluded from this analysis because we had sequencing data for only one homozygous inversion sample; inv9.1 was also excluded because it harbours only six genes.

We also tested whether the inverted or standard haplotypes were enriched for deleterious mutations compared to the rest of the genome. To do so, we included all samples from the major P. maniculatus clade (Fig. 4b; populations a, b, c, e, f, i and j) and calculated pN/pS and π_N/π_S for 500 kb windows across all regions genome-wide, excluding the inverted regions. We tested whether the inverted or standard haplotypes showed significantly higher pN/pS or π_N/π_S than genome-wide regions using one-sided t-tests in R.

SLiM simulations

To explore a possible role of selection on the inversions, we performed forward-genetic simulations in SLiM v.3.6 (ref. ³⁵). We simulated the forest (population c, P. m. rubidus) and prairie (population e, P. m. gambelii) populations evolving under a previously estimated best-fit demographic model²⁰ and introduced an inversion as a Mendelian locus as a single copy. We set separate selection coefficients for the inversion locus in the forest versus prairie populations, varying the selection coefficients from −0.01 to +0.01. We introduced the inversion into either the forest or the prairie population at five time points, corresponding to 1.5 × 10⁴, 1.5 × 10⁵, 7.5 × 10⁵, 1.5 × 10⁶ and 2.2 × 10⁶ generations ago, with 2.2 × 10⁶ being the estimated time of the forest–prairie split. To reduce computational time, we scaled parameters by a factor of 100, with population sizes (N) and times divided by 100 (for example, after scaling, time points ranged from 1.5 × 10² to 2.2 × 10⁴ generations ago) and migration rates (m) and selection coefficients (s) multiplied by 100 (for example, after scaling, selection coefficients ranged from −1.0 to +1.0), to keep Nm and Ns consistent³⁵. For each set of forest and prairie selection coefficients and each time point, we ran 1,000 simulations and recorded the frequency of the inversion in the forest and prairie populations at the end of the simulation. Finally, for each scenario, we computed the probability that the inversion reached an absolute allele frequency difference between the forest and prairie populations >50%. All selection coefficients are reported as their values before scaling.

Clinal variation

To test whether inversion frequency was associated with local habitat, we analyzed P. maniculatus mice previously collected across a forest–prairie environmental gradient, which included 136 samples from nine sites across the Cascade mountains in Oregon, USA²⁰. Using publicly available sequencing data²⁰ (NCBI: PRJNA688305), we genotyped the 136 samples for the inversions (see Genotyping samples for inversions section above) and then used the package HZAR v.0.2.5 (ref. ⁶⁷) to fit clines to inversion genotypes (https://github.com/oharring/chr15_inversion). We fit ten different cline models by varying the scaling of minimum and maximum allele frequencies (scaling: ‘fixed’ or ‘free’) and how exponential tails were fit (tails: ‘none’, ‘left’, ‘right’, ‘mirror’ and ‘both’). We selected the best model for each inversion using Akaike information criterion (with correction for small sample sizes) (AICc) values. Clines shown in Fig. 6b are fit with tails: ‘none’ and scales: ‘fixed’; best-fit clines are shown in Extended Data Fig. 8.

Genotype–phenotype associations

Using data from a reciprocal intercross between P. m. rubidus (forest population) × P. m. gambelii (prairie population) F₂ hybrids (n = 547) as described above, we tested for associations between inversion genotype and three forest-ecotype-defining traits: tail length, foot length and coat colour. We used previously published phenotypic measurements²⁰ and the inversion genotypes reported here. For each of the 13 polymorphic forest–prairie inversions, we tested whether inversion genotype was significantly correlated with trait variation using linear models in R, with genotype coded numerically (additive genetic model); for tail and foot length, we included body length as a fixed effect. We corrected for multiple hypothesis testing (that is, testing 13 different inversions) using Bonferroni correction.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability

Sequencing data are available from NCBI SRA under BioProject accessions PRJNA856879, PRJNA816517, PRJNA860096, PRJNA862503; NCBI SRA accessions for individual samples are listed in Supplementary Table 1. Source data are provided with this paper.

Code availability

The code used for the analyses is available from GitHub (https://github.com/oharring/pman_inversions).

References

Charlesworth, B., Charlesworth, D., Coyne, J. A. & Langley, C. H. Hubby and Lewontin on protein variation in natural populations: when molecular genetics came to the rescue of population genetics. Genetics 203, 1497–1503 (2016).
Article CAS PubMed PubMed Central Google Scholar
Mérot, C., Oomen, R. A., Tigano, A. & Wellenreuther, M. A roadmap for understanding the evolutionary significance of structural genomic variation. Trends Ecol. Evol. 35, 561–572 (2020).
Article PubMed Google Scholar
Wellenreuther, M. & Bernatchez, L. Eco-evolutionary genomics of chromosomal inversions. Trends Ecol. Evol. 33, 427–440 (2018).
Article PubMed Google Scholar
Lowry, D. B. & Willis, J. H. A widespread chromosomal inversion polymorphism contributes to a major life-history transition, local adaptation, and reproductive isolation. PLoS Biol. 8, e1000500 (2010).
Article PubMed PubMed Central Google Scholar
Joron, M. et al. Chromosomal rearrangements maintain a polymorphic supergene controlling butterfly mimicry. Nature 477, 203–206 (2011).
Article CAS PubMed PubMed Central Google Scholar
Küpper, C. et al. A supergene determines highly divergent male reproductive morphs in the ruff. Nat. Genet. 48, 79–83 (2016).
Article PubMed Google Scholar
Lamichhaney, S. et al. Structural genomic changes underlie alternative reproductive strategies in the ruff (Philomachus pugnax). Nat. Genet. 48, 84–88 (2016).
Article CAS PubMed Google Scholar
Thompson, M. J. & Jiggins, C. D. Supergenes and their role in evolution. Heredity 113, 1–8 (2014).
Article CAS PubMed PubMed Central Google Scholar
Bürger, R. & Akerman, A. The effects of linkage and gene flow on local adaptation: a two-locus continent–island model. Theor. Popul. Biol. 80, 272–288 (2011).
Article PubMed PubMed Central Google Scholar
Kirkpatrick, M. & Barton, N. Chromosome inversions, local adaptation and speciation. Genetics 173, 419–434 (2006).
Article CAS PubMed PubMed Central Google Scholar
Charlesworth, B. & Barton, N. H. The spread of an inversion with migration and selection. Genetics 208, 377–382 (2018).
Article PubMed Google Scholar
Ebert, P. et al. Haplotype-resolved diverse human genomes and integrated analysis of structural variation. Science 372, eabf7117 (2021).
Article CAS PubMed PubMed Central Google Scholar
Porubsky, D. et al. Recurrent inversion polymorphisms in humans associate with genetic instability and genomic disorders. Cell 185, 1986–2005 (2022).
Article CAS PubMed Google Scholar
Kapun, M. & Flatt, T. The adaptive significance of chromosomal inversion polymorphisms in Drosophila melanogaster. Mol. Ecol. 28, 1263–1282 (2019).
Article PubMed Google Scholar
Mérot, C. et al. Locally adaptive inversions modulate genetic variation at different geographic scales in a seaweed fly. Mol. Biol. Evol. 38, 3953–3971 (2021).
Article PubMed PubMed Central Google Scholar
Todesco, M. et al. Massive haplotypes underlie ecotypic differentiation in sunflowers. Nature 584, 602–607 (2020).
Article CAS PubMed Google Scholar
Porubsky, D. et al. Recurrent inversion toggling and great ape genome evolution. Nat. Genet. 52, 849–858 (2020).
Article CAS PubMed PubMed Central Google Scholar
De Coster, W., Weissensteiner, M. H. & Sedlazeck, F. J. Towards population-scale long-read sequencing. Nat. Rev. Genet. 22, 572–587 (2021).
Article PubMed PubMed Central Google Scholar
Mérot, C. Making the most of population genomic data to understand the importance of chromosomal inversions for adaptation and speciation. Mol. Ecol. 29, 2513–2516 (2020).
Article PubMed Google Scholar
Hager, E. R. et al. A chromosomal inversion contributes to divergence in multiple traits between deer mouse ecotypes. Science 377, 399–405 (2022).
Article CAS PubMed PubMed Central Google Scholar
Linnen, C. R. et al. Adaptive evolution of multiple traits through multiple mutations at a single gene. Science 339, 1312–1316 (2013).
Article CAS PubMed Google Scholar
Bedford, N. L. & Hoekstra, H. E. Peromyscus mice as a model for studying natural variation. eLife 4, e06813 (2015).
Article PubMed PubMed Central Google Scholar
Bradshaw, W. N. & Hsu, T. C. Chromosomes of Peromyscus (Rodentia, Cricetidae). Cytogenetics 11, 436–351 (1972).
Article CAS PubMed Google Scholar
Sparkes, R. S. & Arakaki, D. T. Intrasubspecific and intersubspecific chromosomal polymorphism in Peromyscus maniculatus (deer mouse). Cytogenetics 5, 411–418 (1966).
Article Google Scholar
Li, H. & Ralph, P. Local PCA shows how the effect of population structure differs along the genome. Genetics 211, 289–304 (2019).
Article CAS PubMed Google Scholar
Huang, K., Andrew, R. L., Owens, G. L., Ostevik, K. L. & Rieseberg, L. H. Multiple chromosomal inversions contribute to adaptive divergence of a dune sunflower ecotype. Mol. Ecol. 29, 2535–2549 (2020).
Article CAS PubMed Google Scholar
Smalec, B. M., Heider, T. N., Flynn, B. L. & O’Neill, R. J. A centromere satellite concomitant with extensive karyotypic diversity across the Peromyscus genus defies predictions of molecular drive. Chromosome Res. 27, 237–252 (2019).
Article CAS PubMed PubMed Central Google Scholar
Alkan, C. et al. Genome-wide characterization of centromeric satellites from multiple mammalian genomes. Genome Res. 21, 137–145 (2011).
Article CAS PubMed PubMed Central Google Scholar
Kidd, J. M. et al. A human genome structural variation sequencing resource reveals insights into mutational mechanisms. Cell 143, 837–847 (2010).
Article CAS PubMed PubMed Central Google Scholar
Numanagić, I. et al. Fast characterization of segmental duplications in genome assemblies. Bioinformatics 34, i706–i714 (2018).
Article PubMed PubMed Central Google Scholar
Kirkpatrick, M. How and why chromosome inversions evolve. PLoS Biol. 8, e1000501 (2010).
Article PubMed PubMed Central Google Scholar
Berdan, E. L., Blanckaert, A., Butlin, R. K. & Bank, C. Deleterious mutation accumulation and the long-term fate of chromosomal inversions. PLoS Genet. 17, e1009411 (2021).
Article CAS PubMed PubMed Central Google Scholar
Jay, P. et al. Mutation load at a mimicry supergene sheds new light on the evolution of inversion polymorphisms. Nat. Genet. 53, 288–293 (2021).
Article CAS PubMed Google Scholar
Tuttle, E. M. et al. Divergence and functional degradation of a sex chromosome-like supergene. Curr. Biol. 26, 344–350 (2016).
Article CAS PubMed PubMed Central Google Scholar
Haller, B. C. & Messer, P. W. SLiM 3: forward genetic simulations beyond the Wright–Fisher model. Mol. Biol. Evol. 36, 632–637 (2019).
Article CAS PubMed PubMed Central Google Scholar
Hager, E. R. & Hoekstra, H. E. Tail length evolution in deer mice: linking morphology, behavior, and function. Integr. Comp. Biol. 61, 385–397 (2021).
Article PubMed PubMed Central Google Scholar
Nachman, M. W. & Searle, J. B. Why is the house mouse karyotype so variable? Trends Ecol. Evol. 10, 397–402 (1995).
Article CAS PubMed Google Scholar
Samonte, R. V. & Eichler, E. E. Segmental duplications and the evolution of the primate genome. Nat. Rev. Genet. 3, 65–72 (2002).
Article CAS PubMed Google Scholar
Greenbaum, I. F. & Reed, M. J. Evidence for heterosynaptic pairing of the inverted segment in pericentric inversion heterozygotes of the deer mouse (Peromyscus maniculatus). Cytogenet. Genome Res. 38, 106–111 (1984).
Article CAS Google Scholar
Sanchez-Donoso, I. et al. Massive genome inversion drives coexistence of divergent morphs in common quails. Curr. Biol. 32, 462–469.e6 (2022).
Article CAS PubMed Google Scholar
Fang, Z. et al. Megabase-scale inversion polymorphism in the wild ancestor of maize. Genetics 191, 883–894 (2012).
Article PubMed PubMed Central Google Scholar
Matschiner, M. et al. Supergene origin and maintenance in Atlantic cod. Nat. Ecol. Evol. 6, 469–481 (2022).
Article PubMed PubMed Central Google Scholar
Felsenstein, J. The evolutionary advantage of recombination. Genetics 78, 737–756 (1974).
Article CAS PubMed PubMed Central Google Scholar
Charlesworth, B. & Jensen, J. D. Effects of selection at linked sites on patterns of genetic variability. Annu. Rev. Ecol. Evol. Syst. 52, 177–197 (2021).
Article Google Scholar
Keightley, P. D. & Otto, S. P. Interference among deleterious mutations favours sex and recombination in finite populations. Nature 443, 89–92 (2006).
Article CAS PubMed Google Scholar
Nei, M., Kojima, K.-I. & Schaffer, H. E. Frequency changes of new inversions in populations under mutation-selection equilibria. Genetics 57, 741–750 (1967).
Article CAS PubMed PubMed Central Google Scholar
Huang, K. et al. Mutation load in sunflower inversions is negatively correlated with inversion heterozygosity. Mol. Biol. Evol. 39, msac101 (2022).
Faria, R. et al. Multiple chromosomal rearrangements in a hybrid zone between Littorina saxatilis ecotypes. Mol. Ecol. 28, 1375–1393 (2019).
Article CAS PubMed PubMed Central Google Scholar
Jones, F. C. et al. The genomic basis of adaptive evolution in threespine sticklebacks. Nature 484, 55–61 (2012).
Article CAS PubMed PubMed Central Google Scholar
Yeaman, S. & Whitlock, M. C. The genetic architecture of adaptation under migration–selection balance. Evolution 65, 1897–1911 (2011).
Article PubMed Google Scholar
Dobigny, G., Britton‐Davidian, J. & Robinson, T. J. Chromosomal polymorphism in mammals: an evolutionary perspective. Biol. Rev. 92, 1–21 (2017).
Article PubMed Google Scholar
Kingsley, E. P., Kozak, K. M., Pfeifer, S. P., Yang, D.-S. & Hoekstra, H. E. The ultimate and proximate mechanisms driving the evolution of long tails in forest deer mice. Evolution 71, 261–273 (2017).
Article PubMed Google Scholar
Wooldridge, T. B. et al. An enhancer of Agouti contributes to parallel evolution of cryptically colored beach mice. Proc. Natl Acad. Sci. USA 119, e2202862119 (2022).
Article CAS PubMed PubMed Central Google Scholar
Bendesky, A. et al. The genetic basis of parental care evolution in monogamous mice. Nature 544, 434–439 (2017).
Article CAS PubMed PubMed Central Google Scholar
Kolmogorov, M., Yuan, J., Lin, Y. & Pevzner, P. A. Assembly of long, error-prone reads using repeat graphs. Nat. Biotechnol. 37, 540–546 (2019).
Article CAS PubMed Google Scholar
Sedlazeck, F. J. et al. Accurate detection of complex structural variations using single-molecule sequencing. Nat. Methods 15, 461–468 (2018).
Article CAS PubMed PubMed Central Google Scholar
Edge, P. & Bansal, V. Longshot enables accurate variant calling in diploid genomes from single-molecule long read sequencing. Nat. Commun. 10, 4660 (2019).
Article PubMed PubMed Central Google Scholar
Kurtz, S. et al. Versatile and open software for comparing large genomes. Genome Biol. 5, R12 (2004).
Article PubMed PubMed Central Google Scholar
Tigano, A. et al. Chromosome size affects sequence divergence between species through the interplay of recombination and selection. Evolution 76, 782–798 (2022).
Article CAS PubMed PubMed Central Google Scholar
Long, A. D. et al. The genome of Peromyscus leucopus, natural host for Lyme disease and other emerging infections. Sci. Adv. 5, eaaw6441 (2019).
Article CAS PubMed PubMed Central Google Scholar
Schweizer, R. M. et al. Physiological and genomic evidence that selection on the transcription factor Epas1 has altered cardiovascular function in high-altitude deer mice. PLoS Genet. 15, e1008420 (2019).
Article PubMed PubMed Central Google Scholar
Stamatakis, A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30, 1312–1313 (2014).
Article CAS PubMed PubMed Central Google Scholar
Leaché, A. D. et al. Short tree, long tree, right tree, wrong tree: new acquisition bias corrections for inferring SNP phylogenies. Syst. Biol. 64, 1032–1047 (2015).
Article PubMed PubMed Central Google Scholar
Letunic, I. & Bork, P. Interactive Tree Of Life (iTOL) v4: recent updates and new developments. Nucleic Acids Res. 47, W256–W259 (2019).
Article CAS PubMed PubMed Central Google Scholar
Andolfatto, P. et al. Multiplexed shotgun genotyping for rapid and efficient genetic mapping. Genome Res. 21, 610–617 (2011).
Article CAS PubMed PubMed Central Google Scholar
Pfeifer, B., Wittelsbürger, U., Ramos-Onsins, S. E. & Lercher, M. J. PopGenome: an efficient Swiss army knife for population genomic analyses in R. Mol. Biol. Evol. 31, 1929–1936 (2014).
Article CAS PubMed PubMed Central Google Scholar
Derryberry, E. P., Derryberry, G. E., Maley, J. M. & Brumfield, R. T. HZAR: hybrid zone analysis using an R software package. Mol. Ecol. Resour. 14, 652–663 (2014).
Article PubMed Google Scholar

Download references

Acknowledgements

We thank T. Sackton, D. Khost and members of the Hoekstra laboratory for their advice on the analyses; T. Sackton, J. Mallet, L. Gozashti, A. Kautt and members of the Mallet laboratory for providing helpful feedback on the manuscript; T. B. Wooldridge for sharing short-read sequencing data; and E. Hager and T. B. Wooldridge for many helpful discussions on inversions. The Bauer Core Facility at Harvard University provided short-read library preparation and sequencing services. The University of Washington PacBio Sequencing Core provided long-read library preparation and sequencing services. Computational analyses were run on the Odyssey and Cannon clusters supported by the Faculty of Arts and Sciences Research Computing Group at Harvard University. We thank the Museum of Southwestern Biology (University of New Mexico), Museum of Comparative Zoology (Harvard University), S. Cushman (US Forest Service, Rocky Mountain Research Station) and C. Thompson (University of Michigan) for providing specimens used in this study. O.S.H. was supported by a National Science Foundation Graduate Research Fellowship, a Harvard Quantitative Biology Student Fellowship (DMS 1764269), the Molecular Biophysics Training Grant (NIH NIGMS T32GM008313), an American Society of Mammalogists Grants-in-Aid of Research and a Society for the Study of Evolution R.C. Lewontin Early Award. H.E.H. is funded as a Howard Hughes Medical Institute Investigator.

Author information

Authors and Affiliations

Department of Organismic & Evolutionary Biology, Department of Molecular & Cellular Biology, Museum of Comparative Zoology and Howard Hughes Medical Institute, Harvard University, Cambridge, MA, USA
Olivia S. Harringmeyer & Hopi E. Hoekstra

Authors

Olivia S. Harringmeyer
View author publications
You can also search for this author in PubMed Google Scholar
Hopi E. Hoekstra
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

O.S.H. conceived the study and performed the analyses, with input from H.E.H. O.S.H. and H.E.H. wrote the manuscript.

Corresponding authors

Correspondence to Olivia S. Harringmeyer or Hopi E. Hoekstra.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Ecology & Evolution thanks Maren Wellenreuther and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Identifying inversion polymorphisms based on population genomic signatures.

For each identified inversion polymorphism, the following signatures of inversions are shown (colors correspond to focal population or population-pair in which inversion was identified, see legend): (1) Local PCA performed with lostruct, where each dot represents a 100-kb window. Distances between local PCA maps are represented by the MDS1 axis, with outlier windows highlighted in color. (2) Clustering of samples by PCA for entire outlier region found with local PCA, assigned using k-means clustering. (3) Heterozygosity (percent of sites that are heterozygous) of outlier region for samples by cluster assignments from PCA above. Boxes indicate upper and lower quartiles; center line represents median; whiskers extend to minimum and maximum values within 1.5x interquartile range; points show outliers beyond whiskers. Sample sizes are shown below the x-axis for each cluster. (4) LD for chromosomes harboring the example inversions, shown as mean r² values for paired windows across each chromosome. Upper triangle shows mean r² values including all samples from PCA clustering. Lower triangle shows mean r² values for only the more common homozygote genotype as determined in PCA clustering. Colored bars highlight outlier region from local PCA. Scales for r² values provided. (5) Recombination rates in cM/Mb shown for lab-born inversion heterozygotes. Outlier region found in local PCA is highlighted. Five inversions have missing data since inversion heterozygotes were not measured in the lab. (6) F_ST between homozygous genotypes (clusters 1 and 3 from PCA and heterozygosity plots). Outlier regions found in local PCA are highlighted. Note that the discontinuity for inv23.0 is likely due to reference genome mis-assembly.

Source data

Extended Data Fig. 2 Recombination effects of inversion heterozygotes.

(a) Recombination rates for inversion versus non-inversion regions from lab-born F₂ hybrids. Recombination rates for inversion regions are measured in inversion heterozygotes only; rates for non-inversion regions include all lab-born F₂ hybrids. Boxes indicate upper and lower quartiles; center line represents median; whiskers extend to minimum and maximum values within 1.5x interquartile range. Points for inversion regions represent inversions (n = 15); points for non-inversion regions represent chromosomes (excluding inversion regions) (n = 23). Recombination rate for inversion regions = 0.01 ± 0.03; non-inversion regions = 0.80 ± 0.34 (mean ± sd). (b) Recombination rates for inversion regions by inversion size in megabases. Linear fit between inversion length and recombination rate shown as red line (F-statistic on 1 and 13 degrees of freedom, p > 0.05).

Extended Data Fig. 3 Localizing inversion breakpoints.

Contigs highlighting breakpoints for 13 inversions. Contigs from de novo genome assemblies (‘query’, y-axis) were aligned to the P. maniculatus reference genome (‘reference’, x-axis) with nucmer. Populations corresponding to the de novo assembly used in each plot are given; for inv9.0 and inv20.0 inversions, breakpoints were localized by aligning P. leucopus and P. californicus reference genomes to the P. maniculatus reference genome, respectively. Contigs (gray) and those identifying inversion breakpoints (red) are shown. Predicted inversion boundaries are highlighted (orange box). For the inv7.2 plot, the pink contig highlights a derived inversion (inv7.3) in the reference genome; when the reference genome is re-oriented to the ancestral state, the contig highlighted in red shows the inv7.2 inversion breakpoints.

Extended Data Fig. 4 Inverted repeats and segmental duplications.

Examples of inversion breakpoints near large inverted repeats (inv9.0, inv15.2, inv20.0) and segmental duplications (inv9.0, inv15.2, inv20.0, inv18.0). Dotplots show alignments for long-read assembly contigs spanning or nearly spanning breakpoints. Self-v-self alignments are highlighted (gray boxes), with alignments between breakpoint regions (within 500 kb of breakpoints) shown in upper left quadrant for inv9.0, inv15.2 and inv20.0. Location of breakpoints (red arrows) shown; only alignments with length >100 bp and within 500 kb of the breakpoints are shown. Inverted repeats mapping to within 500 kb of both breakpoints are shown (red) and highlighted (yellow boxes). Self-v-self alignments also show segmental duplications near breakpoints.

Extended Data Fig. 5 Distributions of inversions across species range.

Genotype frequencies shown across species range for each inversion (for 13 populations shown in Fig. 4a). Inversions were genotyped with PCA; populations with ambiguous genotypes for a given inversion are not included.

Extended Data Fig. 6 Mutational load by inversion heterozygote frequency.

The mean mutational load per inversion, as measured by pN/pS and π_N/π_S, is shown versus the frequency of inversion heterozygotes. Neither pN/pS nor π_N/π_S are significantly correlated with the frequency of inversion heterozygotes (linear fits, shown as blue lines with 95% confidence intervals as gray shading; F-statistics on 1 and 17 degrees of freedom; p = 0.75 (top), p = 0.53 (bottom)).

Extended Data Fig. 7 Forward genetic simulations of selection on inversions.

The evolution of an inversion was simulated in SLiM under a best-fitting demographic model of the forest and prairie populations. The inversion locus was introduced into the populations at five timepoints (shown in generations ago); the final timepoint (2.2 m generations) represents the forest-prairie split time estimate. Simulations for a range of selection coefficients for the inversion were performed, with 1,000 simulations per scenario. The selection scenarios are shown (upper left). Heatmaps show the probability of the inversion reaching a forest-prairie allele frequency difference >50% for each combination of selection coefficients.

Extended Data Fig. 8 Clinal variation in inversion frequencies.

Inversion genotype frequencies shown across an environmental transect, with clines fit using hzar. Best-fit clines with 95% credible cline region shown. Sampled populations are highlighted (black crosses), with focal forest (left-most) and prairie (right-most) populations. Major allele in forest population is plotted, with y-axis label indicating inversion or standard haplotype.

Extended Data Table 1 Summary statistics for de novo genome assemblies with PacBio long-read sequencing

Full size table

Extended Data Table 2 Inversion genotypes for long-read samples

Full size table

Supplementary information

Supplementary Information

Supplementary Table 1.

Reporting Summary

Peer Review File

Source data

Source Data Fig. 1

Statistical source data.

Source Data Fig. 2

Statistical source data.

Source Data Fig. 3

Statistical source data.

Source Data Fig. 4

Statistical source data.

Source Data Fig. 5

Statistical source data.

Source Data Fig. 6

Statistical source data.

Source Data Extended Data Fig. 1

Statistical source data.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Harringmeyer, O.S., Hoekstra, H.E. Chromosomal inversion polymorphisms shape the genomic landscape of deer mice. Nat Ecol Evol 6, 1965–1979 (2022). https://doi.org/10.1038/s41559-022-01890-0

Download citation

Received: 06 May 2022
Accepted: 17 August 2022
Published: 17 October 2022
Issue Date: December 2022
DOI: https://doi.org/10.1038/s41559-022-01890-0

This article is cited by

A pan-genome of 69 Arabidopsis thaliana accessions reveals a conserved genome structure throughout the global species range
- Qichao Lian
- Bruno Huettel
- Raphael Mercier
Nature Genetics (2024)
Genomic structural variation is associated with hypoxia adaptation in high-altitude zokors
- Xuan An
- Leyan Mao
- Kexin Li
Nature Ecology & Evolution (2024)
A novel tetra-primer ARMS-PCR approach for the molecular karyotyping of chromosomal inversion 2Ru in the main malaria vectors Anopheles gambiae and Anopheles coluzzii
- Verena Pichler
- Antoine Sanou
- Nora J. Besansky
Parasites & Vectors (2023)
De novo emergence, existence, and demise of a protein-coding gene in murids
- Jan Petrzilek
- Josef Pasulka
- Petr Svoboda
BMC Biology (2022)

Subjects

Abstract

Similar content being viewed by others

Main

Results

Identifying inversion polymorphisms

Inversion breakpoints

Frequencies and evolution of inversions

Limited evidence for deleterious effects of inversions

Multiple inversions contribute to local adaptation

Discussion

Methods

Population sampling and sequencing

Focal population sampling

Whole-genome resequencing and variant calling

Identifying inversions

Local PCA

PCA and heterozygosity

Linkage disequilibrium

Recombination rates

Genetic differentiation

Localizing inversion breakpoints

PacBio long-read sequencing and de novo genome assembly

Inversion breakpoint identification

Determining derived arrangement

Predicting centromere locations

Characterizing repeat content at inversion breakpoints

Dotplots

Inverted repeats and SDs

Genes near inversion breakpoints

Inversion frequencies

Sampling populations across species range

Phylogenetic trees

Genotyping samples for inversions

Mutational load

SLiM simulations

Clinal variation

Genotype–phenotype associations

Reporting summary

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Extended data

Supplementary information

Source data

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links