Introduction

Geographical separation and extreme climates have resulted in the long isolation of Antarctica and the subantarctic islands. The result is a unique assemblage of animals, some relying entirely on the frozen continent, with others utilising the fringes. Such geographic isolation has been proposed to explain why Antarctic fauna supposedly harbour a paucity of viruses, and is supported by the observation that captive Antarctic penguins are highly susceptible to infectious disease [1]. It has therefore been hypothesised that Antarctic fauna have evolved in a setting of low “pathogen pressure”, reflected in limited microbial diversity and abundance [1, 2]. As a consequence, the potential for climate driven and human mediated movement of microorganisms makes the expansion of infectious diseases to the Antarctic a matter of concern [1, 3,4,5].

To date, a small number of viral species have been described in Antarctic fauna [6]. Serological studies have revealed that Antarctic penguins are reservoirs for influenza A virus (IAV), avian avulaviruses (formerly avian paramyxoviruses), birnaviruses, herpesviruses, and flaviviruses [7,8,9,10,11,12,13]. Despite improvements in the molecular tools for virus detection, it is only in recent years that full viral genomes have been characterised [6]. For example, adenoviruses, astroviruses, paramyxoviruses, orthomyxoviruses, polyomaviruses, and papillomavirus have been identified in Adélie penguins (Pygoscelis adeliae), Chinstrap penguins (Pygoscelis antarctica), and Gentoo penguins (Pygoscelis papua) [6, 14,15,16,17,18,19,20,21,22]. However, sampling is limited and genomic data sparse, such that we have a fragmented understanding of virus diversity in penguins and in Antarctica in general.

Antarctic penguins may also be infected by viruses spread by ectoparasites, particularly the seabird tick Ixodes uriae (White) [23]. For example, seven different arthropod-borne viruses (arboviruses) were identified in I. uriae ticks collected from King penguin colonies on Macquarie Island in the subantarctic [24]: Nugget virus, Taggert virus, Fish Creek virus, Precarious Point virus, and Catch-me-cave virus, all of which belong to the order Bunyavirales (Nairoviridae and Phleboviridae), a member of the Reoviridae (Sandy Bay virus, genus Orbivurus), and a member of the Flaviviridae (Gadgets Gully virus) [23,24,25]. Notably, I. uriae is the only species of tick with a circumpolar distribution and is found across the Antarctic peninsula [26, 27]. I. uriae are mainly associated with nesting seabirds and feed on penguins in the summer months, using off-host aggregation sites for the reminder of the year [28,29,30].

Despite concern over virus emergence in Antarctica there remains little understanding of virus diversity in Antarctic species, nor how virome diversity in Antarctic species relates to that seen in other geographic regions. Herein, we determined the RNA viromes of three species of penguins (Gentoo, Chinstrap, and Adelie) on the Antarctic peninsula, as well as I. uriae ticks that parasitise these birds. With these data in hand we addressed the drivers of virus ecology and evolution in this remote and unique locality.

Materials and methods

Sample collection

Samples were collected as described previously [15, 16]. Briefly, samples were collected from the South Shetland Islands and the Antarctic peninsula from Gentoo, Chinstrap, and Adelie penguins in 2014, 2015, and 2016, respectively (Table 1). A cloacal swab was collected from each penguin using a sterile-tipped swab, placed in viral transport media, and stored at −80 °C within 4–8 h of collection. We selected this sample type as it is largely non-invasive and is the standard sample type for viruses, such as influenza A virus.

Table 1 Metadata for samples included in this study.

Gentoo, Chinstrap, and Adelie penguins were sampled on Kopaitik Island, Rada Covadonga, 1 km west of Base General Bernardo O’Higgins (63° 19′ 5″ S and 57° 53′ 55″ W). Kopaitik Island is a mixed colony containing these three penguin species, although no survey has been performed since 1996 [31]. Gentoo penguins were also sampled adjacent to González Videla Base, Paradise Bay (64° 49′ 26″ S and 62° 51′ 25″ W): this colony is comprised almost entirely of Gentoo penguins (3915 nests), with a single Chinstrap penguin nest reported in 2017 [31]. Chinstrap penguins were sampled at Punta Narebski, King George Island (62° 14′ 00″ S and 58° 46′ 00″ W) and Adelie penguins were sampled at Arctowski Station, Admiralty Bay, King George Island (62° 9′ 35″ S and 58° 28′ 17″ W). A census at the penguin colony at Punta Narebski in 2013 reported 3157 Chinstrap and 2378 Gentoo penguin nests. The penguin colony adjacent to Arctowski Station comprises both Adelie and Chinstrap penguins, with a 2013 census reporting 3246 and 6123 nests and 3627 and 6595 chicks, respectively [31] (Fig. 1).

Fig. 1: Map of the Antarctic peninsula and locations where Antarctic penguins samples and ticks were collected.
figure 1

The relief map was sourced from Wikipedia, developed by user Kikos, and is distributed under a CC-by-SA 3.0 attribution.

In 2017, the common seabird tick I. uriae at various life stages (adult male, adult female, and nymphs) were collected from Paradise Bay (Fig. 1). Ticks were collected under rocks within and directly adjacent to penguin colonies and placed in RNAlater (Ambion) and stored at −80 °C within 4–8 h of collection.

RNA library construction and sequencing

RNA library construction, sequencing and RNA virus discovery was carried out as described previously [32]. Briefly, cloacal swab samples from penguins were extracted using the MagMax mirVana™ Total RNA isolation Kit (ThermoFisher Scientific) on the KingFisher™ Flex Purification System (ThermoFisher Scientific). Extracted samples were assessed for RNA quality using the TapeStation 2200 and High Sensitivity RNA reagents (Aligent Genomics, Integrated Sciences).

RNA was extracted from ticks as described previously [33]. Briefly, ticks were washed in ethanol and homogenised in lysis buffer using a TissueRuptor (Qiagen) and RNA was extracted using the RNeasy plus mini kit (Qiagen). The quality and concentration of extracted RNA was assessed using the Agilent 4200 TapeStation. The ten penguin and five tick samples with the highest concentration corresponding to species/location/age were then pooled using equal concentrations and concentrated using the RNeasy MinElute Cleanup Kit (Qiagen) (Table S1).

Libraries were constructed using the TruSeq total RNA library preparation protocol (Illumina) and host rRNA was removed using the Ribo-Zero-Gold kit (Illumina) for penguin libraries and the Ribo-Zero Gold rRNA Removal (Epidemiology) kit (Illumina) for the tick libraries. Paired-end sequencing (100 bp) of the RNA library was performed on the Illumina HiSeq 2500 platform at the Australian Genome Research Facility (Melbourne). All sequence reads have been deposited in the Sequence Read Archive (SAMN09217486-87, SAMN13567271-74, and SAMN13567241-44). Virus consensus sequences have been deposited on GenBank (MT025058—MT0205177).

RNA virus discovery

Sequence reads were demultiplexed and trimmed with Trimmomatic followed by de novo assembly using Trinity [34]. No filtering of host/bacterial reads was performed before assembly. All assembled contigs were compared with the entire non-redundant nucleotide (nt) and protein (nr) database using blastn and diamond blast [35], respectively, setting an e-value threshold of 1 × 10−10 to remove false-positives.

Abundance estimates for all contigs were determined using the RSEM algorithm [34]. All contigs that returned blast hits with paired abundance estimates were filtered to remove plant, invertebrate fungal, bacterial and host sequences. Viruses detected in the penguin libraries were divided into those likely to infect birds and those likely associated with other hosts [36, 37]. This division was performed using a combination of phylogenetic analysis and information on virus associations available at the Virus-Host database (http://www.genome.jp/virushostdb/). The list was cross-referenced with known laboratory reagent contaminants [38]. Novel viral species were identified as those that had <90% RdRp protein identity, or <80% genome identity to previously described viruses. Novel viruses were named using the surnames of figures in the history of Antarctic exploration. Contigs returning blast hits to the RSP13 host reference gene in penguin libraries and the COX1 reference gene in tick libraries were included to compare viral abundance with host marker genes.

To determine whether any viruses identified in ticks were present in the penguin libraries we used Bowtie2 [39] to assemble the raw reads from each penguin library to the novel virus contigs identified in the tick libraries, and vice versa.

Virus genome annotation and phylogenetic analysis

Viruses were annotated as described previously [32, 33]. Viruses with full-length genomes, or incomplete genomes possessing the full-length RNA-dependant RNA polymerase (RdRp) gene, were used for phylogenetic analysis. Use of the well conserved RdRp is the standard in family-level phylogenetic analyses of RNA viruses, with most other genome regions exhibiting excessively high levels of sequence divergence. Amino acid sequences were aligned using MAFFT [40], with poorly aligned sites removed using trimAL [41]. The most appropriate model of amino acid substitution was determined for each data set using IQ-TREE [42] or PhyML 3.0 [43], and maximum likelihood (ML) trees were then estimated using PhyML. For initial family and clade level trees, SH-like branch support was used to determine the topological support for individual nodes. Virus clusters providing the most relevant background information to the novel viruses identified in here were then extracted and phylogenetic analysis repeated using PhyML with 1000 bootstrap replicates. In the case of previously described viruses, phylogenies were also estimated using nucleotide sequences. Alignment information is presented in Table S1 and the alignments themselves can be found on github: https://github.com/erinhunter4/penguin_virome.

Viral diversity and abundance across libraries

Relative virus abundance was estimated as the proportion of the total number of viral reads in each library (excluding rRNA). All ecological measures in the penguin libraries were calculated only using viruses likely associated with birds. Host association is less complex in tick samples, and in this case we used the full tick data set, only excluding the Leviviridae that are associated with bacterial hosts. Analyses were performed using R v 3.4.0 integrated into RStudio v 1.0.143, and plotted using ggplot2.

Both the observed virome richness and Shannon effective [alpha] diversity were calculated for each library at the virus family and genus levels using the Rhea script sets [44], and compared between avian orders using the Kruskal–Wallis rank sum test. Beta diversity was calculated using the Bray Curtis dissimilarity matrix and virome structure was plotted as a function of nonmetric multidimensional scaling ordination and tested using Adonis tests using the vegan [45] and phyloseq packages [46].

Results

Diversity and composition of Antarctic penguin and tick RNA viromes

We characterised the transcriptomes of six libraries comprising ten individuals each, corresponding to three Antarctic penguin species in three locations and four tick libraries comprising a total of 20 ticks (Table 1 and Fig. 1). RNA sequencing of rRNA depleted libraries resulted in 42,382,642–55,930,902 reads assembled into 189,464–530,470 contigs for each of the penguin libraries. The tick libraries contained 51,498,136–55,930,902 reads assembled into 55,611–223,554 contigs (Table 1). There was a large range in the total viral abundance in both the penguin (0.07–0.7% total viral reads; 0–0.15% avian viral reads) and tick libraries (0.03–2.4%) (Table 1 and Fig. 2). In addition to likely avian viruses, the penguin libraries contained numerous reads matching insect, plant, or bacterial viruses and retroviruses (Figs. 2a and 3). Retroviruses were excluded from later analyses due to the challenges associated with differentiating exogenous from endogenous sequences.

Fig. 2: Abundance of viruses found in penguins and their ticks.
figure 2

a Abundance of all viral reads found in penguin libraries. b Abundance and diversity of avian viruses in each of the penguin libraries. c Abundance of the host reference gene RSP13 in penguin libraries. d Abundance of all viral reads found in the tick libraries. e Abundance and diversity of viruses in each of the tick libraries. f Abundance of the host reference gene COX1 in the tick libraries.

Fig. 3: Phylogenetic overview of the viruses found in penguins and ticks.
figure 3

Viruses found in penguins were divided into two groups—those that infect birds and those that likely to other hosts and are coloured by magenta and orange boxes, respectively. Tick viruses revealed in this study are denoted by blue boxes. Grey boxes refer to reference viruses mined from RefSeq.

The abundance of RSP13, a stably expressed host reference gene in the avian colon [47], was similar across all penguin libraries, yet with lower abundance in the Adelie penguins (Fig. 2c). The abundance of COX1 in tick libraries was consistent with the body size of the ticks included in each library, with the highest abundance in the large adult female ticks and the lowest abundance in the first of two nymph libraries (Fig. 2d–f).

The abundance of avian viral reads was the highest in Chinstrap penguins sampled on Kopaitik Island (0.152% of total reads), and the lowest in Gentoo penguins sampled at the GGV Base for which no avian viral reads were detected. Because this colony is comprised solely of Gentoo penguins only this species was sampled [31]. Alpha diversity (the diversity within each library) was highest in the Adelie and Chinstrap penguins at both the viral family and genus levels, and was lower in Gentoo penguins, even when only considering RNA viromes from Kopaitik Island where all three penguin species were sampled (Figs. S1, S2). Hence, the reason we detected no viral reads at the most southern sampling site (the GGV Base) may be due to a combination of location and species choice (Gentoo penguins).

Although there was variation in virus composition among libraries, members of the Picornaviridae were the most abundant in the Chinstrap and Adelie penguin libraries, comprising 99, 32, 83, and 71% of all the avian viral reads in these four libraries. In marked contrast, the Picornaviridae comprised only 0.25% of avian viral reads from Gentoo penguins on Kopaitik Island (and no avian viruses were found in the Gentoo penguins from GGV). Beta diversity demonstrated connectivity in the RNA viromes from the Adelie penguins, driven by a number of shared viral species across the Kopaitik Island and King George Island sampling locations that are 130 km apart (Figs. 3 and S3).

Within the tick libraries, the greatest virus abundance was seen in the adult female ticks, while the lowest virus abundance was observed in the adult male ticks. Alpha diversity was similar across all libraries. Interestingly, while virus richness was highest in the adult female ticks, Shannon diversity was lower than the other libraries (Fig. S4), although without replicates it is not possible to draw clear conclusions. Given the high virus richness in female ticks, it is not surprising that the largest number of viral species were also described in this library. The tick libraries were also highly connected, with 5/8 species shared among them, although the beta diversity calculations are confounded because of limited sample size (Fig. S3).

Substantial RNA virus diversity in Antarctic penguins and their ticks

Overall, 22 viral families, in addition to four viruses that fell outside well defined viral families but clustered with other unclassified “picorna-like” viruses (Treshnikov virus, Luncke virus, Dralkin virus, and Tolstivok virus), were identified in the penguin and tick libraries (Fig. 3). Of these, the likely bird associated viruses were members of the Astroviridae, Caliciviridae, Coronaviridae, Herpesviridae, Orthomyxoviridae, Paramyxoviridae, Picorbirnaviridae, Picornaviridae, and Reoviridae (Figs. 2b and 3) (see below). Ten of the 13 avian associated viruses identified in the penguins likely represent novel avian viral species (Table S2 and Fig. 4), and two virus species (Avian avulavirus 17 and Shirase virus) were shared among Adelie penguins from different locations. There was no virus sharing among species at individual locations (i.e., no viruses were shared across penguin species at either Kopaitik Island or on King George Island) (Fig. 4), although this is likely because species were sampled in different years. All viruses in the tick libraries, with the exception of Taggert virus, represented novel virus species with amino acid sequence similarity to reference virus sequences from 31 to 81%. Five of the nine virus species described in ticks were shared across libraries, which is unsurprising given that the ticks were collected from the same population. Notably, the nymph libraries contained a higher percentage of viral RNA than both the large nymphs and adult male ticks.

Fig. 4: Bipartite network illustrating biologically relevant virus species for which viral genomes were found in each library.
figure 4

Each library is represented as a central node, with a pictogram of the species, surrounded by each viral species. Line lengths do not correspond to any variable. Where two libraries share a virus species, the networks between the two libraries are linked. Virus colour corresponds to virus taxonomy. Viruses identified in penguin libraries that are unlikely to be bird associated are not shown. A list of viruses from each library is presented in Table S2, and phylogenetic trees for each virus family can be found in Figs. 57 and S5S13.

Strikingly, we identified 82 divergent novel virus species in the penguin libraries that clustered phylogenetically within 11 defined families, as well as three viruses that clustered with a group of unclassified viruses. These unclassified viruses are likely associated with penguin diet or their microbiome: fish, invertebrates, plants, fungi and bacteria (Fig. 2a, Table S3, Fig. 3). The largest diversity was found in the “Narna-Levi”, “Noda-Tombus” and “Picorbirna-Pariti” viral groups [36]. A number of viruses were highly divergent, including clusters of novel viruses that fell within the Narnaviridae and Leviviridae (Table S3 and Fig. 3). Overall, 56 different species of Narna-Levi viruses were identified in Adelie penguins, comprising approximately half of the Narna-like viruses and 21/25 of the Leviviridae: these were likely associated with bird diet or microbiome. All invertebrate associated Picobirnaviridae were found in Chinstrap penguin libraries, while a single picobirnavirus identified in an Adelie penguin library was most closely related to other bird associated viruses (see below). As these 82 viruses are unlikely to be associated with penguins or their ticks, they are not described further.

Novel avian viruses

The novel Wilkes virus was identified in an Adelie penguin on Kopaitik Island, and belongs to the genus Nacovirus (Caliciviridae)—a group dominated by avian viruses (Fig. S5). This virus is closely related to Goose calicivirus and caliciviruses sampled from waterbirds in Australia (i.e., Red-necked Avocet and Pink-eared Duck) [32, 48]. All the picornaviruses identified in this study likely belong to novel or unassigned genera (Fig. S6). Three different variants of Shirase virus were identified in Adelie penguins, two from King George Island and one from Kopaitik Island. Interestingly, Shirase virus falls as a sister lineage to viruses of the genus Gallivirus. Similarly, two variants of Wedell virus were identified in Chinstrap penguins. Wedell virus falls in an unassigned lineage of picornaviruses different than avian viruses identified in metagenomic studies [32, 48]. Three additional picornaviruses were identified in Chinstrap penguins—Ross virus, Scott virus, and Amundsen virus—that fall basal to members of the genus Tremovirus.

Rotaviruses were identified in both Gentoo (Shackleton virus) and Adelie (Mawson virus) penguins. Shackleton virus falls as an outgroup to a clade of rotaviruses recently described in wild birds, which are themselves divergent from rotavirus G virus, while Mawson virus is a sister group to rotavirus D (Fig. 5a). Hilary virus, a picobirnavirus, was found in a clade that contains both avian and mammalian hosts. Interestingly, this virus is most closely related to a human picorbirnavirus, albeit with low amino acid similarity and long branch lengths (Fig. S7). Although there is uncertainty as to whether these viruses are bacterial rather than vertebrate associated [49], they are retained here for comparative purposes.

Fig. 5: Phylogenies of select novel viruses found in penguins.
figure 5

a Phylogenetic tree of the VP1, containing the RdRp, of rotaviruses. The tree is mid-point rooted for clarity only. b Phylogeny of the concatenated major capsid gene and glycoprotein B gene, the only genes recovered, of the Alphaherpesvirinae. Two betaherpesviruses were used as outgroup to root the tree. The viruses identified in this study are denoted with a filled circle and in bold. Bootstrap values >70% are shown for key nodes. The scale bar represents the number of amino acid substitutions per site.

Finally, although most of the novel viruses documented here had RNA genomes, we also identified a novel alphaherpesvirus, Oates virus, that falls as a sister group to Gallid and Psittacid hepervirus 1. Notably, this virus was distantly related to an alphaherpesvirus previously described in penguins (Sphencid alphaherpesvirus) (Fig. 5b).

Avian RNA viruses previously detected in penguins

Previous studies of Antarctic penguins have detected avian IAV and avian avulaviruses [15, 16, 21]. Similarly, we detected an H5N5 IAV in Chinstrap penguins identical in sequence to that reported previously. This is not surprising as the virus described in Hurt et al. [15] was isolated in the same set of samples (Fig. S8). In addition, we identified Avian avulavirus 17 (AAvV-17) in Adelie penguins from both sampling locations (Figs. 6a and S9). This virus was previously isolated in Adelie penguins in 2013 [21] and Gentoo penguins in 2014–2016 [50]. Analysis of the F gene of AAvV-17 indicates that the virus detected in Adelie penguins on both King George Island and Kopaitik Island was more closely related to that from Gentoo penguins sampled between 2014 and 2016 [50] than to the Adelie penguins sampled in 2013 [21] (Fig. 6a). Although AAvV-17 was detected in penguins sampled at two locations (Kopaitik Island and King George Island) only 2 weeks apart, they shared only 98.6% identity. Blastx results also indicated the presence of Avian avulavirus 2, although we were unable to assemble the virus genome.

Fig. 6: Phylogeny of two previously described viruses in penguins.
figure 6

a Phylogeny of the F gene of Avian avulavirus 17. Detection location for viruses identified in this study and Wille et al. [21] are denoted by either a green filled circle (King George Island) or blue filled triangle (Kopaitik Island). Strain names for reference viruses are as presented with the same nomenclature as originally presented in Wille et al. [21]. Avian avulavirus 18 was used as outgroup to root the tree. The scale bar represents the number of nucleotide substitutions per site. b Phylogenetic tree of the ORF1ab, including the RdRp, of avastroviruses. The tree is mid-point rooted for clarity only. The scale bar represents the number of amino acid substitutions per site. Bootstrap values > 70% are shown for key nodes. Viruses identified in this study are denoted in bold.

We also identified a deltacoronavirus and an avastrovirus (Figs. 6b and S10S12). The deltacoronavirus was similar to those reported in birds in the United Arab Emirates, Australia, Niger, and Finland, with ~95% identity. A lack of sampling makes it challenging to determine how deltacoronaviruses in Antarctica and other continents may be shared (Figs. S10, S11). The astrovirus detected was similar (88.3% identity) to a short fragment (1000 bp) previously reported in Adelie penguins on the Ross ice shelf of Antarctica [22] (Table S2), a pattern confirmed by phylogenetic analysis (Fig. 6b). Phylogenetic analysis also reveals that this virus falls in an outgroup to Group 2 viruses, including Avian Nephritis virus (Figs. 6b and  S12). Although we were unable to determine the epidemiology of these viruses in Antarctica, repeated detection on opposite ends of the Antarctic continent makes it possible that this is a penguin specific virus.

Tick associated viruses

The most abundant virus identified within the I. uriae ticks sampled here was a variant of Taggert virus, a nairovirus (order Bunyavirales) previously identified in penguin associated ticks on Macquarie Island: the contigs identified in our data showed 81.6% nucleotide sequence similarity in the RdRp region to Taggert virus [25] (Fig. 7). This Taggert virus variant accounted for 2.0% of total reads (87% of viral reads) in the adult female library and was found in all tick libraries. Because the nucleotide sequences of Taggert virus differed between libraries it is unlikely that they represent cross-library contamination. In addition, we identified 75 reads of Taggert virus in the library containing samples from Chinstrap penguins on Kopaitik Island. Importantly, the tick and penguin libraries were not sequenced on the same lane, or even in the same time frame, thereby excluding contamination. Two other members of the order Bunyavirales were also discovered—Ronne virus and Barre virus—both members of the Phenuiviridae (Fig. S13) that exhibited 80% amino acid sequence similarity across the RdRp segment. Ronne virus was identified in three of the tick libraries (adult male, adult female, and nymph library1) but Barre virus was identified only in a single library (nymph library 2) (Fig. 4).

Fig. 7: Phylogenies of tick arboviruses.
figure 7

a The RdRp segment of select members of the Reoviridae, including the genus Coltivirus. b The RdRp of select members of the Bunyavirales including the family Nairoviridae. The novel tick viruses identified in this study are denoted with a filled circle and in bold. The tree has been mid-point rooted for clarity only. Bootstrap values >70% are shown for key nodes. The scale bar represents the number of amino acid substitutions per site.

The six other novel virus species identified in the tick libraries comprised five viral families: Iflaviridae-like, Alphatetraviridae, Reoviridae, Rhabdoviridae, and Levivirdae. A novel Ifla-like virus, Gerbovich virus, was identified within both nymph libraries. This virus clustered with a group of tick associated ifla-like viruses, including Ixodes holocyclus iflavirus and Ixodes scapularis iflavirus (Fig. S13). Two sister species of virus were identified within the Alphatetraviridae—Bulatov virus and Vovk virus—that showed 76.1% amino acid sequence similarity across the RdRp region. These two viruses are highly divergent from all other RdRp sequences currently available, exhibiting just 35.7% amino acid sequence similarity to the divergent tick-borne tetravirus-like virus (Fig. S13). A novel colti-like virus (Reoviridae), Fennes virus, was identified in the adult male, female and nymph libraries, although we were only able to assemble four segments. Notably, Fennes virus falls basal to the existing coltivirus group, exhibiting just 30% amino acid sequence similarity to Shelly headland virus, recently identified in I. holocyclus ticks from Australia (Fig. 7). The partial genome of a Rhabdovirus, Messner virus, was identified in the adult female library. However, this fragment was of low abundance, and only the RdRp segment (Fig. S13).

Finally, Mackintosh virus, identified in all four tick libraries, was not associated with any other tick viruses. Instead, this virus clustered with viruses from the Leviviridae indicating that it is likely a bacteriophage (Table S2).

Discussion

The advent of metagenomic sequencing and improved sampling has rapidly accelerated the rate of microbial discovery in the Antarctic. Indeed, viruses have now been identified both in the environment (e.g., Antarctic lakes), and in wildlife. We aimed to assess whether Antarctic penguin colonies experience low pathogen pressure as a result of their geographic and climatic isolation, employing meta-transcriptomic virus discovery from three penguin species and their ticks. Although our sample size was relative small, we were able to demonstrate the presence of 13 viral species in three species of penguins from the Antarctic peninsula and nine in ticks associated with penguin nesting sites. These preliminary data therefore counter the idea that animals in the Antarctic harbour less microbial diversity than animals from other geographic regions. Indeed, the penguins sampled show similar levels of RNA virome diversity as Australian wild birds [32, 48]. Recent virome studies of Australian birds, using the same sample type, revealed an alpha diversity (observed richness) of 5.37 and 5.8 per library, with an average of 2.87 and 3.1 viral genomes and 60 and 80% of viruses being novel [32, 48]. In comparison, in the penguins studied here we observed an average richness of 4.6 and 2.8 viral genomes per library, with a virus discovery rate of 76%. There was also an impressive level of viral diversity in the tick libraries considering the small sample size: eight novel virus species and a single previously identified species were identified in 20 ticks, compared with 19 novel viruses in 146 ticks from Australia [33]. Finally, in both the penguin and tick associated viruses revealed we identified similar viral families to those documented previously using the same methods and sample type [32, 33, 48]. This strongly suggests that these families and genera are associated with a huge diversity of birds and ticks across the globe, providing a viral connectivity between geographically distinct localities.

Notably, as all the penguins sampled appeared healthy, the disease-causing capacity of these viruses is uncertain. Ten Chinstrap penguins sampled on King George Island harboured five different viral species, mostly from the Picornaviridae, at very high abundance (0.15% of total reads). Adelie penguins also had high viral diversity, with apparently healthy birds carrying three or four viral species. Perhaps more striking was that Adelie penguins on King George Island and the Antarctic peninsula shared viruses, despite the greater than 100 km distance between these colonies. A similar trend was observed by Wille et al. [21] who found that Adelie penguins sampled in 2013 shared avian avulavirus 17 and 19 across these two locations, thereby revealing a connectivity between penguin colonies. Whether this is due to overlapping foraging grounds, prospecting birds visiting different colonies, viral vectors in the form of predatory and scavenging birds such as Southern Giant Petrels (Macronectes gigantes), Kelp Gulls (Larus dominicanus) or Skuas (Stercorarius spp.) [51, 52], or another unimagined route is unclear. Penguins sampled on King George Island and Kopaitik Island had similar alpha diversity at the virus family and genus levels. Interestingly, no avian viral reads or genomes were detected in the samples from Gentoo penguins at GGV. Whether this is due to geographic structuring of avian viruses in Antarctica, the species sampled at this location (i.e., Gentoo penguins tended to have lower diversity than either Adelie or Chinstrap penguins) or another process merits further investigation.

Combined, these data strongly suggest that penguins are not merely spill-over hosts, but may be central reservoir hosts for a diverse range of viruses. This is apparent in two observations. The first is the repeated detection of specific virus species, such as avian avulaviruses, and that these viruses comprise distinct clusters of related variants. Antarctic penguins have been sampled since the 1970s, and avian avulaviruses have repeatedly been detected, both by serology and PCR. The detection of phylogenetically related avian avulavirus 17 in 2013 in Adelie penguins [21], in 2014–2016 in Gentoo penguins [50], and again in 2016 in Adelie penguins as shown here, strongly suggests that these animals are an important reservoir for these viruses. Although the IAV we detected was the same virus as described previously [15], the long branch lengths in the phylogenetic trees suggest long-term undetected circulation in Antarctica [15].

The second key observation that indicates that penguins are potential virus reservoirs was the presence of likely arboviruses, which is why we paired our analysis of the penguin RNA virome with that of the ticks that parasitise them. Of the nine species of viruses identified within the ticks in this study, two clustered phylogenetically with other arboviruses: the previously detected Taggert virus fell within the Nairoviridae, while Fennes virus was a member of the Reoviridae. Taggert virus was originally identified in I. uriae collected from penguin colonies, and is one of seven I. uriae associated virus species identified in ticks collected from penguin colonies on Macquarie Island [25]. Taggert virus falls phylogenetically within the genus Orthonairovirus, and close to the pathogenic arbovirus, Crimean-Congo Haemorrhagic fever virus. Interestingly, we not only identified Taggert virus in all four tick libraries, but also in Chinstrap penguins on Kopaitik Island. This strongly supports the idea that penguins acted as a reservoir host for Taggert virus [25]. In this context it is important to note that as all penguin and tick samples were processed in separate laboratories and sequenced separately, thereby eliminating cross-library contamination.

Also of note was Fennes virus that clustered phylogenetically within the genus Coltivirus that includes the pathogenic tick-borne virus Colorado tick fever virus as well as a number of tick associated viruses and a species identified in African bats [33, 53,54,55]. Notably, Fennes virus fell in a basal position and was relatively divergent from the other coltiviruses. The vertebrate reservoirs of coltiviruses have been only confirmed for Colorado tick fever virus and Tai forest reovirus—rodents and free tailed bats, respectively—although other members of the genus are suspected to infect rodent species. Interestingly, the viruses identified in I. uriae from Macquarie Island in a series of three studies between the 1970s and 2009 belonged to just four families—Reoviridae, Narioviridae, Phenuiviridae, and Flaviviridae [23,24,25]—three of which were present here. Our phylogenetic analysis suggested that six of the remaining seven virus species identified within the tick data were likely associated with invertebrates, with one other virus (Mackintosh virus) likely a bacteriophage from the family Leviviridae.

There was extensive diversity of viruses identified in the penguin samples that were likely to be associated with hosts other than birds, including entire clades of novel viruses within phylogenetic trees of the Narnaviridae and Leviviridae. Given their phylogenetic position these viruses are likely associated with the fish, crustacean, and plant species ingested by the penguins as a part of their diet, as well as infecting unicellular parasites, fungi, and bacteria. A number of these viruses may also be associated with penguin gut flora: indeed, cloacal swabs are used extensively in studies of bird gut microbiomes [56, 57]. Due to the nature of the cloacal swabs, it is impossible to accurately determine the host for these viruses, although some information can be gleaned from the families in which these virus fall. For example, the Narnaviridae are known to infect fungi and protists, while the related Leviviridae infect bacteria [58,59,60,61]. Other novel viruses fell within invertebrate associated clades of the Nodaviridae and Tombusviridae, associated with both vertebrate and invertebrate infecting viruses in the Picornavirales [36]. There were also a number of novel viruses identified that clustered within the Picobirnaviridae/Partitiviridae group. While the Partitiviridae are recognised as invertebrate associated viruses, the host association of the Picobirnaviridae is currently uncertain [49]. Overall, this demonstrates remarkable undescribed viral richness in those organisms that comprise the diet of Antarctic penguins.

In sum, we reveal substantial viral diversity in Antarctic penguins, their diet and their ticks. We therefore expect that additional viruses will be identified with increased sampling, and a wider variety of sample types, reflecting what it is in reality a relatively high diversity of unique fauna and flora on the Antarctic continent. Clearly, additional sampling of penguins and other species in Antarctica is critical to elucidate the epidemiological connection between Antarctica and the rest of the globe, and from this better understand the mechanisms of viral introduction and circulation.