Genomes, expression profiles, and diversity of mitochondria of the White-footed Deermouse Peromyscus leucopus, reservoir of Lyme disease and other zoonoses

The cricetine rodents Peromyscus leucopus and P. maniculatus are key reservoirs for several zoonotic diseases in North America. We determined the complete circular mitochondrial genome sequences of representatives of 3 different stock colonies of P. leucopus, one stock colony of P. maniculatus and two wild populations of P. leucopus. The genomes were syntenic with that of the murids Mus musculus and Rattus norvegicus. Phylogenetic analysis confirmed that these two Peromyscus species are sister taxa in a clade with P. polionotus and also uncovered a distinction between P. leucopus populations in the eastern and the central United States. In one P. leucopus lineage four extended regions of mitochondrial pseudogenes were identified in the nuclear genome. RNA-seq analysis revealed transcription of the entire genome and differences from controls in the expression profiles of mitochondrial genes in the blood, but not in liver or brain, of animals infected with the zoonotic pathogen Borrelia hermsii. PCR and sequencing of the D-loop of the mitochondrion identified 32 different haplotypes among 118 wild P. leucopus at a Connecticut field site. These findings help to further establish P. leucopus as a model organism for studies of emerging infectious diseases, ecology, and in other disciplines.

www.nature.com/scientificreports www.nature.com/scientificreports/ mtDNA sequence in its circular configuration (Fig. 1). The insertion included the 3′ end of the D-loop region with some substitutions and beyond that pseudogenes of 12S and 16S ribosomal RNAs, NADH dehydrogenase subunits 1 (ND1) and 2 (ND2), and cytochrome c oxidase subunit 1 (CO1). While the truncated ribosomal RNA pseudogene sequences were near identical over their lengths to their mitochondrial genome counterparts, the remnants of the protein coding sequences had multiple frame shifts, substitutions, indels, and internal stop codons.
The second longest insertion, at 4.3 kb, represented positions 8081 to 12,421 of the mtDNA and occupied positions 166,005 to 170,315 of the minus strand of chromosome 14 (Fig. 3B). The nuclear sequence was diminished in length by about one kilobase but was otherwise co-linear with the mitochondrial genome sequence. There were multiple frame shifts, substitutions, short indels, and internal stop codons in the pseudogenes. While the sequence for NADH dehydrogenase subunit 5 (ND5) was identical to the mtDNA over several hundred positions, it was truncated at the 3′ end. Another two insertions, one of 3.9 kb and another of 2.6 kb, included pseudogenes of ND1, ND2, and CO1 and were located on chromosome X (minus strand 103,573,537-103,577,435) and chromosome 1 (minus strand 110,558,850-110,561,436), respectively. All the insertions were in single copy at their locations.
Expression profiles in different tissues and during infection. Following the example of Torres et al. 17 , we applied RNA-seq to assess transcription of the 2 ribosomal RNA genes and the 13 protein coding sequences of the P. leucopus mitochondrion, in this case under different conditions of infection. First examined were the RNA-seq datasets from our study of P. leucopus with (n = 6) or without (n = 4) infection with the Lyme disease agent Borreliella burgdorferi 13 . The mitochondrial genome was not included with the nuclear genome for that earlier study of differential gene expression (DEG) during infection. As Table 1 indicates there were no discernible differences in the expression profiles for these coding sequences in the blood of animals with and without active infection.
For the present study we carried out an experimental infection of adult P. leucopus with the relapsing fever agent Borrelia hermsii. The MTW strain is a member of the clade that utilizes P. maniculatus as a natural reservoir in areas of its distribution, and experimental infections of Peromyscus have been achieved 9 . Five animals were infected by injection and 3 animals received the buffer alone on day 0 and were euthanized on day 4, at which time blood, spleens, livers, and brains were collected. Infection was confirmed in the 5 animals by microscopic examination of the blood and by quantitative PCR of the DNA extracted from the spleen. Whole blood, liver, and brain were subjected to RNA extraction, followed by stranded cDNA synthesis, and then paired-end 100 nt read sequencing.
Of these three sources of samples, only RNA from blood showed DEGs with false discovery rates (FDR) of <0.05 (Table 1). These were limited to 12S (RNR1) and 16S (RNR2) ribosomal RNA, which were three-fold less abundant in infected animals, and CO1, which was half again more abundant in infected animals than control www.nature.com/scientificreports www.nature.com/scientificreports/ animals. The coefficient of determination (R 2 ) between RNR1 and RNR2 TPM values for individual animals, both infected and control, was 0.98. Figure 4 with box plots of normalized RNA abundances (panel A) and scatter plots of corresponding t-test p values (panel B) shows where the infected and control animals differed at each of the non-overlapping 100 nt windows over the length of the mtDNA. This analysis largely matches with the findings with individual coding sequences of Table 1    www.nature.com/scientificreports www.nature.com/scientificreports/ Thereafter, we carried out discovery of D-loop polymorphisms by PCR amplification using custom primers and direct sequencing. By this method we first determined the haplotype sequence of 10 females and 10 males of the closed stock colony of the PGSC. Like the reference genome animal, all were haplotype 1. Review of each of the trace files for the Sanger dideoxy sequences of the PCR products did not reveal detectable heteroplasmy at any of the 886 positions for these animals. We then applied the analysis to stored blood clots collected from wild P. leucopus trapped in areas totaling ~3.6 ha in mixed hardwood forest on the east side of Lake Gaillard in eastern Connecticut, and described in reports by Tsao et al. 19 and Bunikis et al. 20 . Figure 5 is a map of the study area with the locations of the trapping site 1 and the two contiguous sites 2 and 3 approximately 1.5 km distant in the same habitat. The animals were tagged before release, and consequently recaptures of the same animal could be identified.
DNA extracts of 118 different P. leucopus from the Lake Gaillard site had 32 different D-loop haplotypes (Fig. 6). For each of 10 of the 118 animals there were at least additional blood samples from a second or third trapping period; the D-loop sequences were identical between the pairs of replicates. The lack of identity or close similarity of any of the sequences with homologous sequences that have been deposited as P. maniculatus at  The 886 nt alignment of the 32 sequences had 51 polymorphic sites, of which 22 were parsimony informative and 29 were singletons. All informative sites were between positions 90 and 759. Diversity peaked around positions 125 and 600, which correspond with HVR1 and HVR2. Four pairs differed between respective mates at a single position. These were assigned the same haplotype number and were distinguished by an appended letter, e.g. 2 A and 2B. Other haplotypes differed at ≥2 positions. Haplotype 3, first identified in the RML colony, occurred in 3 sampled animals at the Lake Gaillard site. The RML colony was founded with animals captured on Shelter Island, N.Y., presently separated by the Long Island Sound from the Connecticut shore, but part of the Atlantic Coastal Plain during the Last Glacial Maximum 21 . Haplotype 2 A of the Lake Gaillard animal LG1, whose full mtDNA sequence is described above, was identified in three other animals from the site. Haplotype 19 sequence was identical to that of a P. leucopus (accession number AY540360) trapped on Little Deer Island, ME 22 . Haplotype 1, which characterizes the PGSC colony originating near Linville, NC, was not observed among the 118 haplotyped animals in Connecticut. Figure 6 is a distance phylogram of the 32 D-loop haplotypes from Lake Gaillard, along with the two haplotypes, 40 and 41, of animals from the north-central U.S. and 7 representative haplotypes of subspecies bairdii and gracilis of P. maniculatus. The magnitude of diversity manifest in a population of P. leucopus inhabiting the 3.6 ha sampled was comparable to that noted within subspecies of P. maniculatus. Table 2 gives the distribution of the 28 haplotypes among 78 animals, for which the trap site was recorded: 40 animals at site 1 and 38 animals at contiguous sites 2 and 3. The frequencies of individual haplotypes over the combined trapping areas approximated a Poisson model with lambda of 2.8. While the most frequent haplotype, 8, occurred about equally between the two areas, only five other haplotypes were observed at the two locations. The next most frequent haplotypes, 7 and 13, were only detected at site 1, and the five haplotype 5 were only at sites 2/3. As expected for an animal with a home range diameter of approximately 50-60 m with a standard deviation of ~20 m 23 , the distribution of haplotypes between the two trapping areas ~1.5 km apart was not random (exact Likelihood Ratio statistic = 78.0; d.f. 27; 2-sided p = 2 × 10 −7 ).

Discussion
We determined the complete mitochondrial genomes of three stock colony and two wild P. leucopus with origins in the southeastern, northeastern, and north-central U.S. While full mtDNA sequences for several Peromyscus spp. existed in the public database, North America's two most abundant and widely-distributed peromyscines, P. leucopus and P. maniculatus, were not among them. There is an historical irony to this late attention. Four decades previously Avise et al. used restriction enzyme fragment polymorphisms of P. leucopus and P. maniculatus to confirm the maternal inheritance of mitochondria and demonstrate the value of the haploid mtDNA for population genetics 24 .
For DNA sequencing, we extracted DNA from whole liver tissue and not from a mitochondria-enriched subcellular fraction. Nonetheless, we think that the reported sequences for the mitochondrion of the index animal and other representatives of the species are accurate for these reasons: (1) By de novo assembly and mapping to a reference, the fold-coverages in the tens of thousands for a single contig and the mapping consensus were consistent with high numbers of this organelle in each liver cell. (2) Following the example of Tian et al. 25 , the same sequence was obtained as a single contig, and with similar high coverage, by assembly of cDNA reads and mapping of these reads along the entire length of the molecule. (3) The de novo assembly contig was circularly  www.nature.com/scientificreports www.nature.com/scientificreports/ permuted, as expected for a mammalian mitochondrion. (4) While there are dispersed mitochondrial pseudogenes in the nuclear genome, these sequences are sufficiently diverged from aligned portions of the consensus mtDNA to be readily distinguished by criteria of mismatches, indels, and completeness. Moreover, by counts of matched read these pseudogene sequences were in copy numbers consistent with nuclear genome locations rather than the organelle. (5) Direct PCR amplification and sequencing of the D-loop region from the index animal and twenty others from the same closed colony confirmed the consensus mtDNA sequence in that polymorphic region.
The RNA-seq analysis not only confirmed the mtDNA sequence, it also revealed differences in RNA abundances between the coding sequences and along the lengths of the molecule, as found by Torres et al. in Drosophila 17 and by Neira-Oviedo et al. in mosquitoes 26 . While the expression profiles of animals infected with B. burgdorferi could not be distinguished from those of uninfected animals, infection with a pathogen that achieves higher bacterial densities in the blood revealed decreased abundances of the 12S and 16S ribosomal RNAs in blood of B. hermsii-infected animals. Because of the presence of partial pseudogenes of these two loci in the nuclear genome, we cannot exclude the contribution of this chromosomal site to the pool of RNA in the extracts. But given the much lower copy number for this pseudogene locus in the cell than for mitochondria, this would likely be a marginal addition. In any case, the differences in profiles between infected and uninfected were limited to the blood. They did not extend to the liver and brain tissues, most cells of which were in less direct contact with the bacteria than were white blood cells 27 . Given the attribution of P. leucopus' longevity in part to mitochondrial www.nature.com/scientificreports www.nature.com/scientificreports/ traits by some investigators 28,29 , further studies of expression profiles under a variety of other stressful conditions besides infection appear warranted.
In contrast to P. maniculatus, whose subspecies affiliations, such as bairdii for the prairie deermouse, are often applied in articles and reports, subspecies identities of P. leucopus are uncommonly noted. This inattention to subspecies distinctions may be because the human diseases for which P. leucopus is a reservoir are largely restricted to the range of just one subspecies, namely noveboracensis in the northeastern, mid-Atlantic, and north-central U.S. 7,30 . All of the isolates in the present study originated within reported range of this subspecies. The D-loop haplotype of the inbred lineage GS16A1, which derives from a pair captured in Illinois, is identical to that of an animal (accession number GU810297) from the Great Lakes area and identified by Moscarella as noveboracensis 31 . Future studies should include representatives of other subspecies in North America, especially those in the southwestern U.S., where animals identified as subspecies leucopus on morphologic grounds had karyotypes that were distinct from those of noveboracensis in the northeastern U.S. 32 .
Our application of mitochondrial haplotyping was intended not as a comprehensive population structure study but rather as an in-depth survey of the diversity in mitochondria among a population of P. leucopus in a single habitat, chosen because it was typical of areas with substantial enzootic transmission of B. burgdorferi and other I. scapularis-borne pathogens. To that extent it succeeded, albeit with samples collected in 1999 and not necessarily representative of the present population. That limitation acknowledged, the study confirmed the informativeness of the D-loop sequence for haplotyping in this species 18,31 , and the ease with which this can be assessed with the specific PCR and sequencing primers. While the cytochrome c oxidase subunit 1 (CO1) sequence is commonly used for genotyping, the demonstration of a CO1 pseudogene in the nuclear genome cautions against selection of this locus for P. leucopus. The considerable diversity at the study site cautions us against small sample sizes at collecting areas for phylogeographic inferences.
In contrast to the variety of mitochondria haplotypes in the population of wild P. leucopus, there was only one haplotype observed in the animals from PGSC stock colony, for which there were originally 38 founders. As revealed by low-coverage sequencing, the outbred PGSC colony exhibits diversity that approaches the wild population of animals captured at Lake Gaillard 13 . But, as demonstrated by Willoughby et al. 18 , for closed colonies of P. leucopus, even with assiduous avoidance of sib-sib mating, there can be loss of mitochondrial haplotypes, eventually down to one, over the succeeding generations.   Table S1 of Supplementary Materials provides the corresponding National Center for Biotechnology Information (NCBI) (http://ncbi.nlm.nih.gov) BioProject and BioSample identifying numbers and descriptions for these samples.

Stock colony animals. Adult outbred
Field study animals. The field study from which the P. leucopus were captured and blood samples were obtained was described by Bunikis et al. 34 and Tsao et al. 19 and under a Yale University institutional animal care and utilization committee (IACUC) approved protocol (#07596). The methods were carried out in accordance with National Institutes of Health guidelines and federal regulations. The site was a 1,400 ha mixed hardwood forest on private water company property surrounding Lake Gaillard near New Haven, CT (Fig. 5). In brief, there were three trapping grids: sites 1, 2, and 3, each 108 m x 108 m (1.2 ha). Trapped animals were tagged, bled and then released at the site of capture, as described 19,34 . After serum was removed, the residual blood clots were labeled and stored at −20 °C. For the present study we used frozen samples from 1999. Because sites 2 and 3 were close to one another, we combined the results from these two sites for a comparison with site 1, approximately 1.5 km away. One of the blood samples from the Lake Gaillard population and designated as LG1 was the source of DNA for complete sequencing of the mitochondrion.

Experimental infections.
The protocol (AUP-18-020) was approved by the IACUC of the University of California Irvine. The methods were carried out in accordance with National Institutes of Health guidelines and federal regulations. The 6 P. leucopus LL stock infected with Borreliella burgdorferi strain Sh-2-82 and the 4 mock-infected controls were described by Long et al. 13 . After 5 weeks animals were euthanized with carbon dioxide anesthesia and terminal exsanguination. Whole blood was dispensed into heparin-coated tubes and thereafter processed as described 13 . For the present study 5 adult male P. leucopus LL stock animals were infected with 10 3 spirochetes of Borrelia hermsii strain MTW by intraperitoneal of subcutaneous infection divided into 50 µl volumes of phosphate-buffered saline (PBS). The isolate was provided by Tom G. Schwan of the National Institute of Allergy and Infectious Diseases' Rocky Mountain Laboratories. Three other male animals were mock-infected Pman_BW2  Pman_BW  40  41  1  16  12  13  11  24B  24A  8  3  26  14  18  21  25  22  19  5B  5A  29  23  28  27  7  20B  20A  17  9  15  30  6 Table S1 or in Methods. The scale for distance by criterion of observed differences is indicated. Percent bootstrap (1000 iterations) support values of ≥90% at a node are shown. by injection of PBS alone. The B. hermsii had first been propagated in CB17 Severe Combined Immunodeficiency (SCID) mice (Charles River Laboratories) and then harvested as described 35 . Bacteremia in the mice was confirmed by microscopic examination of a wet mount of the blood under phase microscopy on day 4 post-injection and then all mice were euthanized with collection of whole blood, spleen, and brain tissues, which were flash frozen in liquid nitrogen on day 5. Infection at time of collection was confirmed by quantitative PCR of extracted DNA of the spleen as described 36 . Extraction of DNA. Extraction of DNA from liver and kidney tissue for genome sequencing of P. leucopus LL stock was described by Long et al. 13 . Extraction of DNA from liver tissue of P. leucopus GS16A1 strain and RML stock and P. maniculatus BW stock DNA for library preparation and from frozen whole blood, blood clots, and spleen for PCR was carried out with the DNeasy ™ Blood and Tissue Kit with Proteinase K (Qiagen). Extraction of RNA. Extraction of total RNA from blood of P. leucopus infected with B. burgdorferi was described by Long et al. 13 . For extraction of total RNA from freshly obtained and chilled anticoagulated blood was mixed with an equal volume of Buffer EL of the QIAamp RNA Blood Mini Kit (Qiagen) and then processed with the kit and Qiagen QIAshredder spin columns according to the manufacturer's protocol. For extraction of RNA from frozen liver the Qiagen TissueLyser instrument with 3 mm stainless steel beads and the RNeasy Mini Kit (Qiagen) was used. For extraction of RNA from frozen brain tissue the tissue was first extracted with the TRIzol Reagent (Invitrogen) and the Qiagen TissueLyser before continuing proceeding with extraction with the RNeasy kit. Nucleic acid concentrations were determined with a Qubit fluorometer (ThermoFisher Scientific). The quality of the extracted RNA assessed by Agilent 2100 BioAnalyzer with the Nano RNA chip (Agilent). The RNA was stored in RNase-free distilled water at −80 °C.
DNA sequencing and assembly. The procedures for sequencing of the genome of the P. leucopus LL stock were described by Long et al. 13 . For assembly of the mitochondrion sequence only Illumina reads of the LL stock animal were used. For sequencing of the mitochondrial genomes of the GS16A1, RML, and LG1 P. leucopus, as well as the BW stock P. maniculatus total genomic DNA was subjected to library construction and then 150 cycles of paired-end chemistry on an Illumina HiSeq. 4000 instrument. De novo assembly was carried out  www.nature.com/scientificreports www.nature.com/scientificreports/ using Assembly Cell of CLC Genomics Workbench v. 11 with settings of mismatch, insertion, and deletion costs of 3 each, length fraction of 0.6, similarity fraction of 0.95, word size of 25, and bubble size of 50. Paired-end reads were also mapped to the P. polionotus mitochondrion genome (accession KY707301) as the reference. Thereafter Illumina reads for other P. leucopus animals were both de novo assembled and mapped to the LL stock sequence. Assembly of the complete sequence of a mitochondrion of a second Illinois origin P. leucopus (designated IL1) was carried out as described above using reads in the public NCBI Sequence Read Archive, and then we carried out a third-party annotation. Table S1 in Supplementary Materials gives the accession numbers of sequences determined for this study and access information for the corresponding sequence read archives. These other sequences of D-loop regions were accessed from GenBank: P. maniculatus subspecies bairdii (accession numbers EU140795, GU810358, and GU810358) and gracilis (GU810365, GU810368, and GU810373).

RNA-seq.
Library preparation with the Illumina TruSeq mRNA stranded kit was carried out as described 13 .
The libraries were normalized and then multiplexed to achieve 12 samples per flow cell on an Illumina HiSeq. 4000 instrument and 100 cycles of paired-end read chemistry at the U.C. Irvine Genomic High Throughput Facility. The quality of sequencing reads was analyzed using FastQC (Babraham Bioinformatics). The reads were trimmed of low-quality reads (Phred score < 15) and adapter sequences, and corrected for poor-quality bases using Trimmomatic 37 . The analysis was carried out with the suite of tools of CLC Genomics Workbench v. 11 (Qiagen). Paired-end reads were mapped with a length fraction of 0.7 and similarity fraction of 0.9 to both strands of coding sequences for mitochondrial ribosomal 12S and 16S RNA and the proteins. Paired reads that mapped were counted as two. The normalized expression values across samples were in transcripts per million (TPM) 38 . Differential expression between experimental conditions was assessed with an assumption of a negative binomial distribution for expression level and a separate Generalized Linear Model for each 39 . The False Discovery Rate (FDR) with corrected p value was estimated by the method of Benjamini and Hochberg 40 .

Sequence analysis.
Nucleotide diversity (π) over sliding windows was calculated with the DnaSP v. 5.1 program for polymorphism analysis (http://www.ub.es/dnasp). Average nucleotide identity 41 was determined with a web-based ANI calculator (http://enve-omics.ce.gatech.edu/ani/index). Sequence alignment and phylogenetic tree building were carried out with the SeaView v. 4.5 suite 42 . For assessment of heteroplasmy the tool Low Frequency Variant Detection v. 2 of CLC Genomics Workbench v. 11 was used. The BLAST-like Alignment Tool (BLAT; Kent Informatics, Inc.) was used to search the P. leucopus LL genome on the University of California Santa Cruz genome browser (http://googl/LwHDr5) 13 . Statistics. The exact Likelihood Ratio test of an unordered table for non-parametric inference was performed with StatXact v. 6.3 (Cytel). Box plot graphs were made with SYSTAT v. 13.1 software (Systat Software, Inc.).

Data availability
The descriptions and publicly-available database accession numbers of the assembled DNA sequences and the corresponding archived sequence reads for RNA-seq studies are given in Supplementary Materials Table S1.