Introduction

Zoonotic disease emergence is often sporadic and rare, making it challenging to predict, but can have devastating consequences when outbreaks occur1. Wild animals play a critical role in the emergence of viral zoonoses as major reservoirs for transmitting zoonotic pathogens to humans and/or domestic animals2. Additionally, they facilitate pathogen dispersion through long distance migration and international trade, while providing opportunities for pathogen evolution through host switching and genetic exchange3,4.

Among known wild animals, shrews as significant insectivores belonging to the family Soricidae (Mammalia: Eulipotyphla). They encompass 26 genera comprising 385 species worldwide5. As the largest Eulipotyphlan family and the fourth-largest mammalian group, shrews have an extensive evolutionary history dating back 48–41.3 million years ago6. Due to their proximity with humans, domestic animals, and other wild animals, viruses harbored by shrews pose a potential threat to human health. Similar to bats which act as important hosts for many zoonotic viruses, shrews also carry a high diversity of viruses and can serve as a natural reservoir for zoonotic pathogens such as hantavirus, Langya henipavirus (LayV), rotavirus A, and Wenzhou mammarenavirus7,8,9,10,11,12. However, despite these medical and public health implications, shrews remain one of the least explored groups of wild animals in terms of viral diversity. The transmission dynamics of shrew-associated viruses, along with their ecological drivers remain poorly understood.

The eastern coast of China, encompassing cities in Shandong, Jiangsu, Zhejiang, Fujian and Guangdong provinces, is renowned for diverse array of natural ecological habitats that provide a suitable environment for a rich biodiversity of mammals, plants, invertebrates, and microbes. These regions inhabit a significant abundance of wild animals, which often coexist with farm animals within their habitat. This coexistence facilitates ample opportunities for the transmission of pathogens to the latter. The current study was designed to decipher the virome present in shrews, and investigate the viral prevalence across different shrew species and geographic distribution. A comprehensive understanding of these knowledge might be valuable for conducting risk assessments and early warnings regarding the potential emergence or re-emergence of zoonotic pathogens.

Results

Overview of shrew virome

Virome analysis was conducted on a total of 398 shrews captured from five provinces, as shown in Fig. 1 (detailed information listed in Supplementary Table 1). None of these shrews exhibited physical signs of disease. These shrews were determined belonging to six species of four genera within the family Soricidae (Anourosorex, Crocidura, Sorex, and Suncus): Cr. lasiura (n = 242), Cr. shantungensis (n = 90), Su. murinus (n = 27), An. squamipes (n = 16), Cr. tanakae (n = 15), and So. caecutiens (n = 8) (Supplementary Fig. 1). The most frequently sampled species was Cr. lasiura, which was found in four out of five sampling locations. Each sampling location had two to three species present, indicating moderate species diversity.

Fig. 1: The distribution of six shrew species analyzed along the eastern coast of China.
figure 1

A total of 398 shrew obtained between June 2015 and December 2021 were displayed. Geographical locations shaded in orange represent the survey sites and in terms of the sample size. Circle within each province correspond to different shrew species captured in this province as indicated in the legend. The distribution map was generated by ArcMap 10.7.

These samples were grouped into a total of 58 libraries based on host species, sampling location and season information (Supplementary Table 2). The number of shrews per pool ranged from 1 to 8 individuals. Overall, approximately 1.16 terabases of paired-end reads with a length of 150 bp each were generated through total RNA sequencing analysis. After applying raw data filtering, trimming, and error removal processes, over 14 billion remaining reads from 58 libraries were used for virus assembly and identification purpose. In total, we identified a sum of 441,653,742 viral reads (3.13%), which were assembled into 6439 viral contigs. Furthermore, the viral contigs were classified into more than 41 viral families and grouped according to the known host associations determined through Blastx and phylogenetic analyses. Additionally, bacterial, fungal, and plant-associated viruses were also detected in a total of 4790 contigs, which were excluded from subsequent analyses.

Based on the analysis of RdRp protein sequences from RNA viruses and replicase protein sequences from DNA viruses, a total of 126 viruses from 23 families were identified and characterized (Fig. 2a and Supplementary Table 34). This encompassed 35 species of positive-sense single-stranded RNA (ssRNA) viruses from nine families (Arteriviridae, Astroviridae, Dicistroviridae, Flaviviridae, Hepeviridae, Iflaviridae, Permutotetraviridae, Picornaviridae, and Solinviviridae); 23 negative-sense ssRNA viruses from ten families (Aliusviridae, Artoviridae, Chuviridae, Hantaviridae, Lispiviridae, Nairoviridae, Orthomyxoviridae, Paramyxoviridae, Rhabdoviridae, and Xinmoviridae); six double-stranded RNA (dsRNA) viruses from two families (Picobirnaviridae and Sedoreoviridae); as well as one single-stranded DNA (ssDNA) virus in the family Circoviridae and one reverse transcribing DNA (RTDNA) virus in the family Hepadnaviridae. Additionally, there were 60 species of unclassified RNA viruses. Among these viral families, Paramyxoviridae was most frequently detected with the largest number of libraries (24), followed by Picobirnaviridae (14), Picornaviridae (13), Arteriviridae (12), and Orthomyxoviridae (11), whereas other viral families were only sporadically detected (Fig. 2a). No viruses closely related to either SARS-CoV, SARS-CoV-2 or other sarbecoviruses was found among all examined shrews.

Fig. 2: An overview of the virome associated with shrews.
figure 2

a The species richness and abundance of the virome in shrews were assessed. The viruses determined from twenty-three viral families, unclassified Riboviria, unclassified Picornavirales, and unclassified Durnavirales that belonged to six viral types (ssRNA virus, negative-sense ssRNA virus, dsRNA virus, ssDNA virus, RTDNA virus, and unclassified RNA virus) are shown. The relative abundance of viruses in each library was calculated and normalized by the number of mapped reads per million total reads (RPM). The colors on the heatmap represent different shrew species and sampling location. Virome composition was determined for each shrew species, sampling seasons, and sampling location b. Principal coordinate analysis showing variation in virome compositions among shrew species c, sampling season d, and sampling location e. All P values were calculated using adonis test.

Virome composition in relation to the host, season, and sampling location

We examined the influence of host biology, province-based collection location, and seasonal variations on virome composition, by assessing beta diversity between communities as well as alpha diversity within each community, using observed richness and Shannon index. Significant variation was found at the family level among six shrew species (adonis test, P = 0.001; Fig. 2b, c), four seasons (adonis test, P = 0.001; Fig. 2b, d), and five collection locations (adonis test, P = 0.008; Fig. 2b, e), respectively. No statistically significant difference in alpha diversity were detected among five location groups based on pairwise Wilcoxon rank sum test (P > 0.05) (Fig. 3a). However, the Cr. shantungensis group exhibited a significantly higher virus richness, with a median of 5, compared to the Cr. lasiura group with a median of 2 (Wilcoxon rank sum test, P = 0.03; Fig. 3b). The Shannon index of virome from the Cr. shantungensis group also demonstrated a significantly higher value compared to that of the Cr. lasiura group (Wilcoxon rank sum test, P = 0.046, Fig. 3b).

Fig. 3: Virus diversity in relate to host species and sampling locations.
figure 3

Comparison of virome composition (species richness and abundance) and alpha diversity (Virus richness and Shannon index) across sampling location a and shrew species b. All P values were calculated using a two-sided Wilcoxon rank-sum test.

The virome compositions among six shrew species groups were further visualized using Venn diagrams. The number of viruses varied across different shrew species ranging from 5 to 66, with the highest count observed in Cr. shantungensis and the lowest in So. caecutiens (Supplementary Table 5). A total of 92 out of 126 viral species (73%) were exclusively detected in a single shrew species, including 33 in Cr. shantungensis, 26 in Cr. lasiura, 15 in Su. murinus, 9 in Cr. tanakae, 6 in An. squamipes, and 3 in So. caecutiens (Supplementary Fig. 2). Thirty-four viral species were found to be present in at least two shrew species, however, the majority (27/34) was harbored by Cr. lasiura and Cr. shantungensis, while no viral co-existence was observed between Cr. tanakae and any of the other five shrew species. Notably, Crohivirus A and Hubei arthropod virus 3 were identified across four out of six shrew species. Crohivirus A was detected in Cr. lasiura, Cr. shantungensis, So. caecutiens and Su. murinus, whereas Hubei arthropod virus 3 was found in Cr. lasiura, Cr. shantungensis, An. squamipes and Su. murinus. None of the viruses were shared by all six shrew species.

Determination of viruses with human pathogenicity and spillover risk

The study identified a total of 126 viral species associated with vertebrates and invertebrates, out of which 54 were previously documented (Fig. 4a). In order to assess their potential impact on public health, our focus was directed on taxonomic groups that contain viruses with zoonotic potential (i.e., ability to infect humans) and spillover-risks (tendency to cross species barriers and infect other orders). Among the 54 known viruses, 34 were found to possess spillover risk, while six RNA viruses, including influenza A virus (H5N6), rotavirus A, rabies virus, avian paramyxovirus 1, rat hepatitis E virus, and LayV were identified as human pathogens (Fig. 4b and Supplementary Fig. 3). With the exception of So. caecutiens, all five other shrew species were found to carry at least one human pathogens or spillover-risk viruses (Fig. 4c). Cr. shantungensis had the highest number of human pathogens (5) and spillover-risk viruses (24), followed by Cr. lasiura (3 and 14 respectively), An. squamipes (2 and 5), Su. murinus (2 and 2), and Cr. tanakae (1 and 1). It is noteworthy that Hubei arthropod virus 3 was detected in four shrew species (An. squamipes, Cr. lasiura, Cr. shantungensis, Su. murinus), with a prevalence of 50% (1/2), 30% (10/33), 50% (7/14), and 40% (2/5) in their respective libraries. Among the 40 human pathogens and spillover-risk viruses identified in this study, a significant proportion of them, namely 77.5% (31/40) were discovered with in the past decade, with the highest number of new findings occurring in 2016 (Fig. 4d).

Fig. 4: Viruses with human pathogenicity and spillover risks.
figure 4

a Taxonomy of all 126 viruses identified in this study. The count of viruses in each family are labeled with red font. The count of new and known viruses are labeled with white and black font, respectively. b The 40 viral species with human pathogenicity (n = 6) and spillover risk (n = 34) from ten viral families and unclassified Riboviria identified in this study, with each family represented by a distinct color. The viruses that could not be assigned to any known families are shown in shades of orange. c Human pathogenic viruses (in red circle) and spillover-risk viruses (in blue circle) in relate to five shrew species. The number of libraries where each virus was detected were represented by different colors. d The discovery year of human pathogenic viruses and spillover-risk viruses was determined based on the year of detection or infection, rather than the publication year. In case where a range of detection years existed, we considered the earliest year. The source of the image was retrieved from Wikimedia Commons (https://commons.wikimedia.org/wiki/Main_Page).

Geographic distribution of viruses with spillover risk

The majority of newly discovered viral species were detected in Shandong province (45 out of 72), followed by Zhejiang (14), Guangdong (7), Jiangsu (6) and Fujian (2) provinces (Supplementary Fig. 4). It’s noteworthy that ten viruses previously unreported in China from six families were identified (Supplementary Fig. 5a and Supplementary Table 6), originating from nine countries across six continents including Asia, Africa, North America, South America, Oceania, and Europe. Of particular significance is the identification of six viruses previously unreported in China with spillover-risk that have been found: army ant associated dicistrovirus 5, Anourosorex squamipes sapelovirus, Chimba virus, cricket paralysis virus, Gorebridge virus, and Solenopsis invicta virus 11. The other four viruses (Crocidura lasiura henipavirus, crohivirus A1, Olivier’s shrew virus 1, and Jeju virus) have not yet been determined to pose a spill-over risk.

Out of the 54 known viruses identified in the study, 31 were previously reported exclusively in China and detailed discovery location were available for 20 of them. A comparison between the current host/location and the initial discovery host/location revealed that 15 out of 20 viruses expanded either their range of hosts or their epidemic areas (Supplementary Fig. 5b–g). It is worth noting that 14 of them are spillover-risk viruses originally discovered in different insect species from Hubei or Sichuan provinces, but unexpectedly detected in shrews collected from multiple sampling location during this study (Supplementary Fig. 5b–f). The remaining virus without spillover risk—Crocidura tanakae henipavirus, a shew-borne virus previously identified in Hubei, was also found in shrews captured from Shandong and Guangdong in the current study (Supplementary Fig. 5g).

Discovery of new RNA viruses

A total of 72 new viruses were identified as new species based on the current demarcation criteria by ICTV13 (Supplementary Table 3), which encompassed 19 positive-sense ssRNA viruses, nine negative-sense ssRNA viruses, three dsRNA viruses; as well as 41 unclassified RNA viruses (Fig. 4a). Apart from the unassigned viruses, these new viruses belonged to 17 known viral families within nine orders: Iflaviridae (5 species), Picornaviridae (5), Arteriviridae (3), Astroviridae (2), Nairoviridae (2), Rhabdoviridae (2), Sedoreoviridae (2), Aliusviridae (1), Artoviridae (1), Chuviridae (1), Dicistroviridae (1), Hepeviridae (1), Lispiviridae (1), Permutotetraviridae (1), Picobirnaviridae (1), Solinviviridae (1), and Xinmoviridae (1). Out of the identified set of 72 new viruses, 16 exhibited high prevalence, as they appeared in multiple libraries within either the same or different species. For example, Crocidura shantungensis arterivirus 1 was detected in 12 libraries belonging to three shrew species (Cr. lasiura, Cr. shantungensis, Su. murinus) (Supplementary Fig. 6). The unclassified viruses were determined in Riboviria (n = 32), unclassified Picornavirales (n = 8), and unclassified Durnavirales (n = 1).

Additionally, a total of 37 complete viral genomes were obtained from 32 new viral species in five viral families, unclassified Riboviria, and unclassified Picornavirales (Supplementary Table 5). No recombination events were detected in the new viruses and simplot analysis revealed their substantial genetic divergence from 4 representative new viruses and related members in their respective families (Supplementary Fig. 7).

Diversification and evolution of viruses in shrews

We individually verified the genome structure of 72 new viral species and constructed phylogenetic trees based on the protein sequences of RdRp for RNA viruses and replicase protein for DNA viruses, encompassing all 126 viral species (Fig. 5, see Supplementary Fig. 832 for detailed structure and phylogenies).

Fig. 5: Phylogenetic diversity of 23 major viral families.
figure 5

Phylogenetic trees were established based on amino acid sequences of the RdRp protein for RNA viruses or the replicase protein for DNA viruses for the currently identified viruses that included: a Twenty-three negative-sense single-stranded RNA (ssRNA) viruses; b Thirty-five positive-sense ssRNA viruses; c Six double-stranded RNA (dsRNA) viruses; d One single-stranded DNA (ssDNA) virus; e One reverse transcribing DNA (RTDNA). The best-fitting model was determined by the ModelFinder program implemented in IQ-TREE v1.6.12 based on the Bayesian information criterion (BIC). Phylogenetic inference was performed using maximum likelihood (ML) method with 1000 bootstrap replicates. Branch lengths are indicated by the scale bar. The red, blue, green, and black circles represent new viruses, previously published human pathogenic viruses, previously published spillover-risk viruses, and other viruses identified in this study, respectively.

Negative-sense ssRNA viruses

A total of 23 negative-sense ssRNA viruses were detected, including nine newly identified viruses and 14 previously identified ones, belonging to ten viral families (Paramyxoviridae, Hantaviridae, Rhabdoviridae, Nairoviridae, Aliusviridae, Artoviridae, Chuviridae, Lispiviridae, Orthomyxoviridae, and Xinmoviridae, in descending order of virus count) (Fig. 5a and Supplementary Fig. 918). Within the Paramyxoviridae family, nine known viruses were characterized, among them two are human pathogenic viruses (avian paramyxovirus 1 and LayV) (Supplementary Fig. 9). Our avian paramyxovirus 1 shares amino acid (aa) sequence identities ranging from 97.5 to 98.4%, while LayV shares identities ranging from 87.5 to 99.8% with previously reported strains in the RdRp gene. Two new viruses belonging to the Rhabdoviridae family were identified with aa identities ranging from 41.5 to 79.8% compared to other associated viruses. Additionally, the nearly complete genome of rabies virus was observed, sharing 99.8% aa identity with previously reported strain from a dog in China (Supplementary Fig. 10). One new virus within the Chuviridae family was identified and grouped into the genus Chuvivirus (Supplementary Fig. 11), while a newly discovered viral species with a complete genome was found in the Lispividae family (Supplementary Fig. 12). The large protein (2032 aa) present in Crocidura lasiura lispivirus 2 consisted of the Mononeg_RNA_pol domain, which specifically targeted the RdRp in Mononegaviruses (Supplementary Fig. 8). Three shrew-specific known viruses (Cao Bang virus, Jeju virus, and Imjin virus) were identified in three shrew species with > 80% aa sequence identity to their closest known relatives within family Hantaviridae (Supplementary Fig. 13). Two stains of one new virus were identified in the Artoviridae family and grouped into the genus Peropuvirus with < 50% aa sequence identity compared to other related viruses (Supplementary Fig. 14). Within the Nairoviridae family, a new virus named Crocidura tanakae nairovirus 1 with complete genome was discovered, showing 61.4% identity with its closest known relative in the RdRp gene. The L segment encoded a large protein (3890 aa) that consisted of the Bunya_RdRp domain, which focused on the RdRp in Bunyaviruses (Supplementary Fig. 8 and 15). Additionally, one virus was discovered in families Aliusviridae and Xinmoviridae, respectively. These viruses were classified as new members within the genera Ollusvirus and Hoptevirus, respectively (Supplementary Fig. 1617). Furthermore, a complete genome of influenza A virus (H5N6) belonging to Alphainfluenzavirus genus was also identified within the Orthomyxoviridae family, showing 99.7–100% aa sequence identities in RdRp gene with previously reported strains from Japan and Bangladesh (Supplementary Fig. 18).

Positive-sense ssRNA viruses

A total of 19 new and 16 known positive-sense ssRNA viruses from nine viral families were identified (Fig. 5b and Supplementary Fig. 1927). The 19 new viruses were distributed across eight families: Iflaviridae (n = 5), Picornaviridae (n = 5), Arteriviridae (n = 3), Astroviridae (n = 2), Dicistroviridae (n = 1), Hepeviridae (n = 1), Permutotetraviridae (n = 1), and Solinviviridae (n = 1). The 16 known viruses were distributed across seven families: Dicistroviridae (n = 4), Picornaviridae (n = 4), Permutotetraviridae (n = 3), Iflaviridae (n = 2), Arteriviridae (n = 1), Flaviviridae (n = 1), and Hepeviridae (n = 1). Among them, five new viruses in the family Picornaviridae exhibited significant divergence from known viruses, including a new member of the genus Parabovirus (Sorex caecutiens picornavirus) and four unclassified picornaviruses (Supplementary Fig. 19). Amongst the new viruses within the families Iflaviridae and Dicistroviridae, four were categorized under the genus Iflavirus, while one was grouped as unclassified dicistroviruses (Supplementary Fig. 2021). Notably, the RNA genome of Crocidura shantungensis dicistrovirus 1 contained two non-overlapping ORFs (ORF1 and ORF2). The ORF1 encoded the nonstructural proteins (1769 aa), including RNA helicase (Hel) and RdRp, while the ORF2 encoded the capsid proteins (812 aa) (Supplementary Fig. 8). Within the Arteriviridae family, three new viruses (Crocidura shantungensis arterivirus 1, Crocidura shantungensis arterivirus 2, Crocidura shantungensis arterivirus 3) were identified but could not be assigned to any existing virus groups. These three viruses exhibited aa identities ranging from 54.2 to 78.8% with other members of arteriviruses. Additionally, one known virus named Olivier’s shrew virus 1, was detected within this family, and displayed an aa identity of 82.5% with a previously reported strain of Olivier’s shrew virus 1 found in Cr. olivieri in Guinea (Supplementary Fig. 22). Four viral species comprising three known ones along with a newly discovered species were detected within Permutotetraviridae but remained unclassified at a genus level (Supplementary Fig. 23). Within the Astroviridae family, two new viruses (Crocidura lasiura astrovirus and Crocidura tanakae astrovirus) were discovered, showing <60% aa identities with their closest known relatives based on RdRp sequence similarity (Supplementary Fig. 24). Only one new virus (Crocidura lasiura hepevirus) was identified among the unclassified members of the family Hepeviridae. A previously known virus associated with human infection, rat hepatitis E virus, was identified in Cr. tanakae within the genus Rocahepevirus (Supplementary Fig. 25), which shared 99% aa identity with a previously reported strain from Apodemus chevrieri in China. Within the Flaviviridae family, only one known virus (Jingmen Crocidura shantungensis pestivirus 1) was identified and clustered with other species in genus Pestivirus (Supplementary Fig. 26). One new virus, shrew solinvivirus, was discovered among the unclassified Solinvi-like viruses within the Solinviviridae family and exhibited 40% aa identity with Hangzhou Solinvi-like virus 1 (Supplementary Fig. 27).

Ds RNA viruses

A total of six dsRNA viruses, including three new viruses and three known viruses, were identified across two families (Sedoreoviridae, Picobirnaviridae) (Fig. 5c and Supplementary Fig. 2829). Within the Sedoreoviridae family, two new viruses clustered within the genus Rotavirus with < 80% aa identity to other viruses (Supplementary Fig. 28). Two known viruses, rotavirus A and bat rotavirus, were also identified in this family. Notably, a human pathogenic virus (rotavirus A) was identified in two shrew species (Cr. shantungensis and Su. murinus), sharing 87.7–98.8% identity with their closest known relatives. Furthermore, bat rotavirus, was discovered in shrews; previously it had only been identified in bat in Bulgaria, with which our strain from the shews shared 98.9% aa identity. Finally, only one new virus was identified within the genus Picobirnavirus of Picobirnaviridae family and exhibited significant divergence from known viruses (Supplementary Fig. 29).

SsDNA and RTDNA viruses

Phylogenetic analysis revealed the presence of two known DNA viruses: Sichuan mosquito circovirus 3, a ssDNA virus, and shrew hepatitis B virus, a reverse-transcribing DNA virus (Fig. 5d, e and Supplementary Fig. 3031). These two DNA viruses were classified within the families Circoviridae and Hepadnaviridae, respectively. Our newly identified Sichuan mosquito circovirus 3 showed close clustering with previously reported members of the genus Circovirus within the family Circoviridae (Fig. 5d and Supplementary Fig. 30). Pairwise similarity analysis demonstrated a high aa identity of 94.3% between our sequenced Sichuan mosquito circovirus 3 and previously reported strain from mosquitoes in Sichuan province in 2020. Within the family Hepadnaviridae, both two shrew hepatitis B viruses, currently identified from Su. murinus individuals, exhibited aa identity ranging from 95.1% to 96.1% with shrew hepatitis B virus identified in Cr. lasiura. in China. The currently identified shrew hepatitis B viruses were closely related to other members of the Orthohepadnavirus genus (Fig. 5e and Supplementary Fig. 31).

Cross-species transmission of viruses

Based on a combination of our data and the virus records from the NCBI database, all 54 known viruses have been identified as capable of cross-species transmission, detectable in two or more vertebrate and invertebrate species (Fig. 6a). Among these, 64.81% (35/54) were detected for the first time in shrews. Notably, the currently identified virus with cross-species transmission encompassed 25 vertebrate-specific viruses; 3 vertebrate- and invertebrate-associated viruses, and intriguingly, 26 viruses that were previously known as invertebrate-specific. It is also noteworthy that human pathogenic viruses exhibited a higher ratio of host orders/viral species (30/6) compared to the spillover-risk viruses (17/34) and shrew-specific viruses (1/14) (Supplementary Table 7). All viruses with human pathogenic were observed across ≥ 3 mammal orders, while the majority of spillover-risk viruses (76.47%, 26/34) were exlucsively detected in a single mammal order–Eulipotyphla. Remarkably, avian paramyxovirus 1 had the widest range of vector and hosts (21 orders), co-circulating among four mammal orders, 16 bird orders, and one insect order (Supplementary Table 7). Additionally, the 17 enveloped viruses demonstrated a broader host range (1–21 susceptible host orders) compared to the 18 non-enveloped viruses (1–9 susceptible host orders).

Fig. 6: Cross-species transmission of viruses.
figure 6

a The cross-species pattern of 54 known viruses is depicted based on a combination of our data and the virus records from the NCBI database. Green cycles indicate viruses which were identified in shrew for the first time. Red and blue cycles represent viruses with human pathogenicity and spillover-risk identified in this study, respectively. Green triangles and red triangles denote enveloped viruses and non-enveloped viruses, respectively. Two DNA viruses are marked with an asterisk (*). b A host network map illustrating connections among different orders of vector/host through shared 54 known viruses in this study. The number of shared viruses by each vector/host group is indicated within bracket. Mammalia, aves, insecta and other invertebrates are shaded in red, green, and blue, respectively. c Virus transmission among shrews for 72 new viruses identified in this study is shown. Host range was determined solely based on the host species information obtained during the current research. The cross-species transmitted new viruses are labeled using red font. d Venn diagrams illustrating the number of cross-species transmitted new viruses out of the 72 identified in this study. The count of cross-species transmitted new viruses is marked using blue font. The source of the image was retrieved from Wikimedia Commons (https://commons.wikimedia.org/wiki/Main_Page).

To gain an in-depth insight into the relationship between viruses and host diversity, we conducted a resampling analysis on association networks involving 54 known viruses and 39 orders encompassing both vertebrates and invertebrates (Fig. 6b). We identified seven specific host orders that exhibited close association with Eulipotyphla in terms of virus sharing. These included four mammal orders (Artiodactyla, Chiroptera, Primates, and Rodentia) as well as three invertebrate orders (Diptera, Hymenoptera, and Orthoptera), each hosting more than four of these known viruses.

Out of the 72 new viruses discovered, 12 viruses were found to be present in more than one shrew species, while five were identified in two genera of shrews. The cross-species transmitted viruses identified here belonged to four viral families, as well as unclassified Riboviria and unclassified Picornavirales. It is noteworthy that families Artoviridae, Arteriviridae, Astroviridae, and Sedoreoviridae, accounted for 41.7% (5/12) of the cross-species new viruses (Fig. 6c). There was a total of 60 new viral species exclusively detected in a single shrew species: Cr. lasiura harbored 18 viral species; Cr. shantungensis had 17 viral species; Su. murinus contained 14 viral species, An. squamipes possessed seven viral species; Cr. tanakae hosted two viral species; while So. caecutiens carried two distinct viral species (Fig. 6d).

Discussion

Recently, meta-genomic and meta-transcriptomic analyses have been increasingly used to survey the virome in diverse animals over a broad geographical range. These analyses have provided valuable data that significantly contribute to our understanding of the existing viral population under ecological dynamics3,4,14,15. Such studies play a crucial role in preemptively addressing the zoonotic emerging infectious diseases (EID) before they spill over into human populations, and result in large-scale outbreaks. In this study, we present an elucidation of the virome population in the primary shrew species along the eastern coast, which represents the main habitat region for shrews in China16. Lung samples were utilized for analysis due to their significance as primary sites for viral infection in animals, and their significant role to facilitate viral horizontal transmission between species17,18. We have identified 126 known and new viruses with single-strand or double-strand genomes that exhibit either negative- or positive-sense polarity, monopartite or segmented structures. This analysis sheds light on genetic diversity, virus evolution, and host range associated with these viruses.

Shrews are ranked as the fourth most diverse species of mammals, following the families Muridae, Cricetidae and Vespertilionidae. They encompass at least 385 species across 26 genera that inhabit various habitats worldwide, include marshes, meadows, grasslands, forests, woodlands etc5. They play a crucial role in the transmission of various zoonotic pathogens between humans and animals. The most commonly observed hantaviruses identified in shrews include Hantaan virus, Seoul virus, Dobrava virus, and Puumala virus, which can cause hemorrhagic fever with renal syndrome (HFRS) in humans9. Additionally, rat hepatitis E virus responsible for persistent hepatitis in humans, had also been found in shrews10. We have also discovered LayV, a paramyxovirus belonging to the Henipavirus genus of the Paramyxoviridae family in the eastern China7, that are closely related to the deadly Nipah and Hendra viruses. Despite the increasing number of viruses discovered in insectivorous shrews, their roles in virus emergence, transmission, and evolution have received limited attention. As of 2023, only a few studies have been conducted on the viral diversity in shrews3,14,19,20. Sasaki et al. found that nearly half of the enteric virome from shrews captured in Zambia belonged to invertebrate viruses in the family Dicistroviridae19. Wu et al. identified only six new viruses through a survey on viral diversity using pharyngeal and anal swabs collected from 224 shrews in China14. He et al.’s metagenomic analysis on the liver virome in Su. murinus in Fujian and Guangdong province of China20, revealed sequences related to 12 viral families and 18 viral genera20. Chen et al. on the other hand, performed a metagenomic analysis on internal organ and fecal samples from wild bats, rodents, and shrews sampled in Hubei and Zhejiang provinces3, identifying 263 viruses from 261 shrews. In this study, we conducted a meta-transcriptomic analysis on six common shrew species belonging to four genera across five sampling provinces along the eastern coast of China, providing significant updates compared to previous studies (Supplementary Fig. 33). When comparing our study with Chen et al.‘s research focusing on three common shrew species or one specific sampling location (Zhejiang province), it is evident that our study has identified a greater number of known viruses as well as new viruses (Supplementary Fig. 33).

A total of 72 new viruses were discovered in the current study, with the majority of classified viruses belonging to the highly diverse family Picornaviridae, which comprises 68 genera and 158 species13. Although picornaviruses primarily infect humans and vertebrates, their presence in shrews has been rarely reported3,19. In this study, we identified five new shrew picornaviruses, one of which was distantly related to members of the genus Parabovirus, while four remained unclassified picornaviruses. Additionally, we detected three spillover-risk picornaviruses for the first time in shrews: Anourosorex squamipes sapelovirus, parabovirus A1, and rodent cardiovirus. These rodent-borne viruses were previously identified from rodent feces collected in United Kingdom (Anourosorex squamipes sapelovirus) in 201721, Zhejiang province (parabovirus A1) in 2012, and Taizhou city in China (rodent cardiovirus) in 201422. Our findings have made significant contribution to expanding our knowledge on genetic diversity, geographical distribution, host range, and evolution within the Picornaviridae family.

The virome of a specific host is impacted by various factors, including but not limited to the host’s taxonomy, habitat, or environmental conditions. In this study, we have identified that host taxonomy plays a crucial role in determining viral diversity for shrews. Among the six surveyed shrew species, Cr. shantungensis exhibited a higher number of both known and new viruses compared to other species. Furthermore, its viral composition significantly differed from the others despite sharing similar ecological niches and geographic distributions. Conversely, factor such as sampling location and sampling seasons had less significant impact on viral diversity. This observation can be partially attributable to the wide geographic range, large population sizes, and high population densities of Cr. shantungensis in East Asia (including the northeastern and central China, Korean, and Russian Far East)23 (Supplementary Fig. 34). These findings together highlight the role of shrews as hyper-reservoirs in facilitating the zoonotic spillover of pathogens, i.e., the transmission of a pathogen in the present study, we have identified six zoonotic pathogens that are highly relevant to human health, including influenza A virus (H5N6), rotavirus A, avian paramyxovirus 1, rat hepatitis E virus, rabies virus, and LayV. It is noteworthy that while LayV had been previously detected in patients and shrews during our earlier study7, this study also revealed a new shrew host species, Su. murinus, in addition to the previously reported hosts, Cr. lasiura and Cr. shantungensis. Furthermore, this study has identified rat hepatitis E virus in Cr. tanakae, which is an emerging zoonotic pathogen associated with acute hepatitis on a global scale24. Particularly in Hong Kong, China alone, there have been 16 documented cases of human infections out of a total of 21 reports worldwide25,26. Additionally, we present the evidence of rabies virus presence in shrews, indicating an expanded host range for this zoonotic pathogen responsible for causing Rabies— a fatal zoonosis accounting for over 60,000 annual deaths among human and mammal worldwide27. These findings collectively emphasize the urgent need to enhance surveillance efforts among human populations that are exposed to these shrew species.

Viral cross-species transmission is the primary driver behind the increasing incidence of emerging diseases; however, our current understanding of this complex process remains limited. Certain risk factors have been identified, such as ecological perturbations and the characteristics of viral reservoir species. From viral structure perspective, we reveal that enveloped viruses generally exhibit a wider range of hosts compared to nonenveloped viruses, i.e., 1–21 host order for 18 enveloped viruses versus 1–9 host order for 17 non-enveloped viruses (Fig. 6a). This finding align with previous research indicating that enveloped viruses infect a greater number of host species and are more likely to be zoonotic than non-enveloped viruses28.

It has been reported that a significantly higher proportion of invertebrate-associated viruses were identified in shrews compared to bats and rodents3. Our study also unveiled a substantial number of viruses previously thought to be specific to or associated with invertebrate. The majority of these (28 out of 29) were identified for the first time in shrews, with some being present across multiple shrew species. Notably, certain viruses showed relatedness to those previously identified in both invertebrates or mammals, while others belonged to insect-specific virus families (e.g., Dicistroviridae). This observation might be attributed to shrews’ arthropod-based diet, which includes fleas, ticks, and mites. It is plausible that certain viruses associated with invertebrates could have infected these hosts through their dietary interactions. Nonetheless, their capacity to temporally harbor these invertebrate viruses could contribute to the transmission chains of the virus or facilitate the emergence of new strains or variants. This finding implies that shrews might play significant roles in the natural circulation of arbovirus and serves as critical links in transmitting arbovirus between humans and animals.

Our study has inherent limitations that need to be acknowledged. Firstly, due to limited sample size for certain shrew species, such as So. caecutiens and Cr. tanakae, it was not feasible to conduct a more systematic comparison of viral diversity and abundance based on host, season, and sampling location. Secondly, due to the extensive nature of the dataset, we were unable to verify all references and establish the validity of identified hosts retrieved from NCBI. Thirdly, given that shrews inhabit diverse ecosystems, such as marshes, meadows, grasslands, forests, and woodlands, it is imperative to explore the association between shrew habitats and virome composition in further study. Further investigations are required to determine the potential pathogenicity of the newly identified viruses toward humans through virus isolation and experimental infection in animal models.

The current study provides valuable data for a better understanding of the existing viral population and its ecological dynamics within shrew populations along the eastern coast of China.The identification and characterization of virome carried by shrews will have significant implications for viral taxonomy and may offer insight into zoonotic diseases. Our findings lay the foundation for further studies into a wider range of viral lineages prior to potential spillover events into human populations, thereby contributing to the prevention of large-scale outbreaks caused by emerging infectious diseases transmitted by shrews.

Methods

Sample collection

During the period from June 2015 to December 2021, as part of a long-term project aimed at evaluating the role of wild small mammals in carrying zoonotic pathogens, shrews were captured from a total of 13 sites along the eastern coastal region in five provinces (Shandong, Jiangsu, Zhejiang, Fujian, and Guangdong provinces) (Supplementary Table 1). Briefly, snap traps were used to capture wild small mammals which were then morphologically identified to the species level, and further confirmed through sequencing of the cytochrome b (cytb) gene29. Climate data for the sampled locations was also collected (https://www.worldclim.org/) (Supplementary Table 8). Among all captured wild small mammals, those identified as shrews were specifically chosen for virome analysis.

RNA extraction, library preparation and sequencing

The shrews were sampled for blood and/or multiple organs. Lung samples from the same species, seasons, and locations were combined to form 58 distinct pools representing 6 common shrew species: Anourosorex squamipes (Chinese mole shrew), Crocidura lasiura (Ussuri white-toothed shrew), Crocidura shantungensis, Crocidura tanakae, Sorex caecutiens (Masked shrew), and Suncus murinus (Asian house shrew) (Supplementary Table 2). Next-generation sequencing (NGS) was conducted following a previously described method7. Briefly, RNA extraction from lung samples was performed using the AllPrep DNA/RNA Mini Kit (QIAGEN) according to the manufacturer’s instructions. Nucleic Acid Microbes Purification kit (KAPA RNA HyperPrep Kit with RiboErase, Roche) was used to enrich viral RNA from total RNA samples in each pool. The remaining RNA was subjected fragmentation, reverse-transcription, adenylation of ends and adaptor-ligation. Following cDNA purification, PCR amplification was performed. Paired-end sequencing (150 bp) of each RNA library was performed with MGI High throughput Sequencing Set (PE150) on MGISEQ-2000 platforms (MGI, China).

Sequence analyses and viral species demarcation

The low-quality raw reads from each library were filtered using Trimmomatic program (v0.39). Ribosomal (r)RNA reads were removed by aligning them to the SILVA rRNA database using Bowtie2 (v2.4.5). Following data filtering, trimming, and error removal, the remaining high-quality reads underwent de novo assembly utilizing MEGAHIT (v1.2.9) software with default parameter settings. These assembled contigs were aligned against the non-redundant (nr) protein database from GenBank available until July 2023 using Diamond Blastx (v2.0.14.152). Viral-related contigs exhibiting e-values lower than 1e-5 were retained for further analysis. Contigs showing significant similarity to viral proteins under Riboviria (NCBI Taxid 2559587) were initially identified as potential virus sequences, while those assigned to Retroviridae were excluded from subsequent analysis. In cases where a viral contig did not match any defined family, it was designated as unclassified viruses. Potential host associations for the virus contigs were preliminarily determined based on taxonomic information obtained from Blastx results and subsequently confirmed through phylogenetic relationships with viruses having known host associations. Briefly, viral contigs falling within bacterial, fungal or plant virus groups were excluded, while those clustering with known vertebrate and/or invertebrate-associated virus groups were retained.

The clean reads were aligned back to the assembled virus contigs using Bowtie2 (v2.4.5) with an end-to-end alignment method to assess coverage and depth to ensure the assembly quality. SAMtools (v1.5) was utilized for sorting and indexing these alignments, from which the read counts for each virus contig were obtained. The abundance of each viral species was estimated as the numbers of mapped reads per million total non-rRNA reads (RPM) in each library. Virus contig with RPM lower than 1 in the library were considered false-positives and excluded from subsequent analysis. Viral contigs encompassing the RNA-dependent RNA polymerase (RdRp) for RNA viruses or replicase for DNA viruses were retained. Species assignment of resulting viral contigs followed the species demarcation criteria established by the International Committee on Taxonomy of Viruses (ICTV, https://ictv.global/)13 (Supplementary Table 3). In cases where a genus lacked clear species demarcation criteria, a relatively strict threshold of 80% amino acid identity to known viral species for RdRp or replicase was applied (Supplementary Table 3)3,30. These criteria were also employed to identify new viral species. Additionally, we investigated the cross-species events of all identified viruses. For new viruses, host range determination was based solely on the host species information obtained in this study. For known viruses, host range determination relied on host species information retrieved from NCBI and supplemented with newly identified hosts here. Briefly, all nucleotide sequences of 54 detected known viruses were retrieved through the NCBI/GenBank database version released on December 15, 2023 (https://ftp.ncbi.nlm.nih.gov/genbank), and detailed information (including references, sampling locations, host, and collection date) was collected. Further details were obtained from the references, if the original virus sequence information was incomplete. We further defined the ratio of host orders/viral species as the number of hosts at the order level that can harbor or transmit the total number of viruses at the species level. Specifically, if a virus species was found in over one host species, it was defined as “cross-species” virus3. By contrast, a “spillover-risk” virus, is defined as one that has not been reported to infect humans, but has demonstrated transmission across more than one order of animals, thus showing higher potential for zoonotic transmission31.

Phylogenetic and recombination analyses

The reference protein sequences for each virus family were downloaded for phylogenetic analysis. Sequences alignments were performed using MAFFT (v7.505). Ambiguously aligned regions in multiple sequence alignments were removed using the MEGA7 (v7.0.26) and TrimAl (v1.4.rev15). The best-fitting model was determined by the ModelFinder program implemented in IQ-TREE (v1.6.12)32. Phylogenetic trees were reconstructed based on maximum likelihood (ML) method with 1000 bootstrap replications. Potential recombination events within viral genomes and possible recombination breakpoints were identified through Simplot (v3.5.1)33 and RDP (v4.97)34.

Statistical analysis

To assess the viral diversity in relation to sampling location, shrew species and seasons, we compared the richness, and alpha diversity (Shannon index) observed across five sampling locations (Shandong, Jiangsu, Zhejiang, Fujian, and Guangdong), six shrew species (i.e. An. squamipes, Cr. lasiura, Cr. shantungensis, Cr. tanakae, So. caecutiens, and Su. murinus), and four seasons respectively. Pairwise Wilcoxon rank sum tests was performed to determine the difference in pairwise comparisons. Statistical analysis was performed using R software (version 4.2.2).