Revisiting the genetic diversity of emerging hantaviruses circulating in Europe using a pan-viral resequencing microarray

Hantaviruses are zoonotic agents transmitted from small mammals, mainly rodents, to humans, where they provoke diseases such as Hemorrhagic fever with Renal Syndrome (HFRS) and its mild form, Nephropathia Epidemica (NE), or Hantavirus Cardio-Pulmonary Syndrome (HCPS). Hantaviruses are spread worldwide and monitoring animal reservoirs is of primary importance to control the zoonotic risk. Here, we describe the development of a pan-viral resequencing microarray (PathogenID v3.0) able to explore the genetic diversity of rodent-borne hantaviruses endemic in Europe. Among about 800 sequences tiled on the microarray, 52 correspond to a tight molecular sieve of hantavirus probes covering a large genetic landscape. RNAs from infected animal tissues or from laboratory strains have been reverse transcribed, amplified, then hybridized to the microarray. A classical BLASTN analysis applied to the sequence delivered through the microarray allows to identify the hantavirus species up to the exact geographical variant present in the tested samples. Geographical variants of the most common European hantaviruses from France, Germany, Slovenia and Finland, such as Puumala virus, Dobrava virus and Tula virus, were genetically discriminated. Furthermore, we precisely characterized geographical variants still unknown when the chip was conceived, such as Seoul virus isolates, recently emerged in France and the United Kingdom.

Fennoscandia region, Germany, Belgium, The Netherlands, France, and more recently, United Kingdom 2-8 . Tools to investigate the genetic diversity of hantaviruses in their animal reservoirs along their geographical distribution are particularly needed to better understand the epidemiology and predict hantavirus disease outbreaks in humans, and to set up appropriate public health measures [9][10][11][12][13][14] .
The recent development of molecular methods without an a priori hypothesis has brought a critical benefit in the field of diagnosis and research on infectious diseases. They can permit to obtain the sequence of a pathogen present in a biological sample, in absence of specific probes required for classic PCR or hybridization [15][16][17][18][19][20][21][22][23] . Among them, Next Generation Sequencing (NGS) have become widely used for identification and whole genome sequencing of novel animal or zoonotic pathogens and metagenome analysis 22,23 . These techniques are more accessible in terms of equipment and cost, but still require a complex downstream bioinformatics analysis which may represent a cumbersome step demanding a specific expertise. On the other hand, resequencing microarray do not demand a sophisticated analysis for interpretation. Indeed, the sequence obtained following hybridization of the amplified genetic material on the chip is used without any intermediate step for both BLASTN enquiry and phylogenetic analysis. Although the random amplification does not allow to specifically concentrate the viral material present in the specimen compared to classic or multiplex PCR assays, the following hybridization step on the resequencing microarray further improves the sensitivity and the specificity of the assay. In addition, the assay has the advantage of tolerating critical divergence levels, up to 20%, permitting a precise characterization from a short stretch of about 20 detected nt 20 , that can hardly be achieved when few reads are obtained following NGS assays. Resequencing microarray has been broadly used over the last ten years, for the detection and identification of emerging agents, such as monkeypox virus 17,18 , pandemic influenza viruses 19 , hemorrhagic fever viruses 20 , rhabdoviruses 21 . Several steps of the assay can be performed in the field (mobile laboratory) including sample preparation, random amplification by Phy29 (stable at room temperature) up to hybridization on the microarray. A specific equipment (fluidic station/scanner) is only required for the post-hybridization analysis and scanning.
PathogenID is a collaborative effort of teams of Institut Pasteur that combined expertise to develop 3 generations of resequencing microarrays for the detection of different emerging pathogens, bacteria and viruses, during public health urgencies and for research studies. The 1 st and 2 nd generations of the microarray contained only a limited number of viral sequences, 46 (PathogenID v1.0) and 126 (PathogenID v2.0), respectively. In particular, only sequences of six prototype orthohantaviruses, mainly associated with human diseases, were included: PUUV, DOBV, SEOV, Hantaan orthohantavirus (HTNV), Sin Nombre orthohantavirus (SNV), Andes orthohantavirus (ANDV) 17,20 . The targeted sequence corresponded to part of the large (L) segment coding for the RNA-dependent RNA-polymerase (RdRp), reputed to be the most conserved region of the genome 24 . While PathogenID v2.0 has been validated for detection of DOBV, SEOV, HTNV, ANDV and SNV in infected Vero E6 cells, it was not for PUUV, yet the most commonly circulating orthohantavirus in Europe 20 . Therefore, the 3 rd generation resequencing microarray (PathogenID v3.0), employed in the present cooperative study at the European scale has been specialized for virus detection and contains more than 800 viral sequences. This pan-viral resequencing microarray covers many viral pathogens critical for both animal and public health and in particular most of the known circulating species and variants of zoonotic viruses. Among those sequences, 52 were strategically chosen to cover the diversity of hantaviruses, in particular the most frequent rodent-borne species that are the only one reported today to have a zoonotic potential. The small (S) segment of the genome encoding the nucleocapsid (N) protein was chosen as target to design the probes since there are more efficient to discriminate variants within a species; in addition more reference sequences were present in GenBank at the moment of the pan-viral chip conception.
The present manuscript illustrates the use of the PathogenID v3.0 resequencing microarray to map the genetic diversity of several endemic hantaviruses mainly associated with human disease in Europe, such as PUUV, DOBV, including genotypes Dobrava and Saarema 25 , TULV, SEOV, Topografov hantavirus (TOPV) according to their geographical distribution.

Methods
Design of PathogenID v3.0 pan-viral resequencing microarray. The objective for the conception of the 3 rd generation resequencing microarray (PathogenID v3.0) was to reach the widest coverage in virus diversity of both medical and veterinarian interest. Due to the technical limits of the microarray, we selected the minimum number of probes for each viral species/variant, in order to include the highest number of sequences. For each viral family, the included probes were chosen taking into account the reference sequences published in GenBank and our experience from the earlier generations of the microarray (PathogenID v1.0 and PathogenID v2.0) [17][18][19][20][21] .
PathogenID v3.0 contains 838 sequences including virus prototypes and their variants belonging to different families (complete list available upon request to the authors). Regarding hantaviruses, 52 N protein encoding S segment partial sequences were included (Table 1; Supplement 1), according to the known ICTV taxonomy in force when we designed the chip (e.g. choice of the probe sequences). The length of the tiled sequences was: i) 425 nucleotides (nt) for those available in GenBank, with the exception of seq234 (251 nt); ii) 303 nt for those sequenced in the laboratory during previous analyses and not published at the moment of the study (Table 1; Supplement 1).
Upon sequence selection, PathogenID v3.0 was manufactured by Affymetrix (Santa Clara, California) according to their high density resequencing approach, based on the use of stepwise overlapping 25 nt long probes, the first covering position 1-25 of the tiled sequence, the second position 2-26, etc.: each probe comprises a set of 4 different alleles differing in the central 13th position (e.g. A, C, G or T) 16 www.nature.com/scientificreports www.nature.com/scientificreports/ to evaluate the potential of the microarray for hantavirus detection, prior testing the field animal samples. PUUV plasmid was taken as reference and tested alone and in pool with the HNTV and TULV plasmids. Laboratory strains. Laboratory prototype strains of PUUV, DOBV (genotypes Dobrava and Saaremaa) TULV and TOPV isolated on Vero E6 cells in BSL3 containment were provided by the virology participant laboratory in Finland [26][27][28][29][30] (Fig. 1).
France. PUUV, the most frequent hantavirus circulating in France, was obtained from bank voles samples captured in France in 2011 in the Ardennes region 31 , SEOV (Lyon strain) from Norway rats 39 .
Finland for Fennoscandia region. Lung tissues originating from PUUV-positive bank voles captured in Konnevesi in 2008 were used 30 .
Germany. PUUV and TULV RNAs originating from lungs of animal reservoirs captured across Germany 32-36 were used.
All experiments were performed in accordance with relevant guidelines and regulations. RNAs extracted from tissue sample from wild animals were received from various Hantavirus Reference Centers in Europe ( Fig. 1) through the EU program EVA (European Virus Archive -n° 228292) which facilitates access to virus/tissue library under MTA (Material Transfer Agreement). All of them have been previously published in peer-reviewed literature. All handling procedures of captured rodents followed the regulations of each respective country. The species studied are not protected and all efforts were made to minimize animal suffering. RNA extraction. RNA was extracted using the QIAamp Viral RNA Extraction kit (Qiagen) from both animal organ homogenates or cell supernatants. cDNA was synthetized by Superscript III system (Invitrogen, Thermo Fisher).

Genetic detection and phylogeographical characterization of hantaviruses by the pan-viral
Random amplification. Genetic material, either plasmid DNA, or cDNA from animal organs or cell supernatants, was amplified by WGA (Whole Genome Amplification) and WTA (Whole Transcriptome Amplification) approaches, respectively, using φ29 polymerase-mediated random amplification (Qiagen), followed by a ligation step 17 .
Hybridization on the microarray and sequence detection. Amplified products were hybridized overnight at 45 °C on the microarray PathogenID v3.0 after fragmentation and labeling using GeneChip Resequencing Reagent Kit (Affymetrix). Chips were then subjected to washing, fluorescence detection and scanning using the Affymetrix equipment (Wash Control, Scan Control). Resequencing analysis was performed using the software GSEQ. 4.1 (Affymetrix). For each of the 52 hantavirus sequences fixed on the chip (positions seq222 to seq273), an output sequence was obtained in a .txt format, with determined (A, G, T or C) or non-determined (N) positions (example in Supplement 2). Significant sequences obtained were used for Call Rate calculation and BLASTN analysis.  www.nature.com/scientificreports www.nature.com/scientificreports/ Call Rate calculation. Call Rate (CR) was calculated as the ratio (%) between the number of determined ('called') nucleotides (e.g. A, G, T, C) following hybridization and the total number of nucleotides for each tiled sequence (e.g. 425, 251 or 303 nt, for hantaviruses). Phylogenetic analysis. Phylogenetic analysis was performed for PUUV (as representative of European orthohantavirus) by using (i) the reference sequences (S segment) available on GenBank, (ii) the sequences tiled on the chip, (iii) the sequences corresponding to the tested hantaviruses, when known.
Firstly, a phylogenetic tree was constructed with the complete coding part of reference sequences by the maximum-likehood method (ML) with PhyML v3.0, implemented in Seaview (v.4.6.1) under the most appropriate nucleotide substitution model as determined by SMS program (available online at http://www. atgc-montpellier.fr/sms/) 42 . Branch supports were evaluated by approximate likelihood-ratio test (aLRT SH-Like). Then, short sequences of the chips and of tested hantaviruses were placed in this backbone tree using RAxML available online on the CIPRES portal at (http://www.phylo.org). Branch supports of these phylogenetic placements were evaluated by the rapid bootstrap procedure with MRE-based Bootstopping criterion as highly recommended on the online software version.
For each sample giving a positive result following hybridization, BLASTN analysis of the resulting sequence, was pointed out on the phylogenetic tree, and compared to the sequences having permitted the detection and genetic characterization.

Results
Initial validation of the 3 rd generation pan-viral resequencing microarray for hantavirus detection and genetic characterization. The performance in hantavirus detection and genetic characterization of the resequencing chip Pathogen ID v3.0 was first evaluated using plasmids encoding prototype hantavirus sequences (PUUV, HTNV, TULV). The PUUV plasmid encompassing the N protein coding region (1831 nt) of the reference Sotkamo strain 2009 was hybridized to Pathogen ID v3.0 which includes 22 PUUV S segment sequences (seq222 to seq243) ( Table 1; Supplement 1). For each of these sequences, Table 2 summarizes (i) the calculated sequence similarity (%) with the tested PUUV Sotkamo strain sequence; (ii) the percentage of correctly identified nucleotides (CR) following its hybridization to the chip and (iii) the result of the BLASTN analysis performed with the obtained raw sequences (Supplement 3).
It clearly appeared that the CR values were proportional to the sequence similarity between the tested sequence and the tiled ones (Table 2). From 100% (seq225, Sotkamo itself) to 91.5% (seq231) of identity, the CR was very Colours, values and arrows outline a window of sequence identity (%) for BLASTN results obtained from each output sequence following hybridization: from no detection/identification (blue), to general PUUV characterization (Sotkamo + others, orange) to precise and exclusive characterization (Sotkamo, red). Complementary information is described in Table 2  www.nature.com/scientificreports www.nature.com/scientificreports/ high (97.7% to 75%, resp.) and the BLASTN identified the Sotkamo strain without ambiguity. Down to 82.2% of sequence identity (seq230), the CR remained above 34%, still designing Sotkamo in priority by BLASTN, although some tiled sequences already hesitated in precise identification (seq232) or even failed in identification (seq226). Down to 82% of homology, the CR decreased dramatically and the tested sample failed to be identified in some cases. Plotting of the sequence identity to the CR confirmed these observations (Fig. 2): below 80.7% of identity between the tested and the tiled sequences the microarray becomes inefficient for specific detection; the window between 80.7% and 83.2% of identity is critical and versatile between no detection, detection with unspecific determination, and precise genetic characterization of the tested sequence; higher than 83.2% of identity, the microarray identifies precisely the tiled sequence.
We verified that simultaneous detection was possible when plasmids containing the N protein coding region of the three hantavirus reference strains (PUUV Sotkamo 2009, TULV Moravia, HTNV 76/118) were mixed in pool (Supplement 3-4). The CR values of PUUV output sequences (seq222-seq243) were even higher (e.g. more determined nucleotides) for the pool of the three viruses, most likely due to cross-contribution from the three viruses in hybridizing the same sequence tiled on the chip, thus producing better results in BLASTN analysis (Supplements 3-4).  www.nature.com/scientificreports www.nature.com/scientificreports/ Mapping genetic diversity of hantaviruses circulating in Europe. The same technology validated with plasmids encoding hantavirus N protein was applied to tissue samples of PUUV, TULV, DOBV, SAAV, SEOV, TOPV infected rodents originating from different endemic areas in Europe as well as to supernatants of cells infected with laboratory strains (Tables 3-7; Supplement 5). Using RNA extracted from lung and liver of PUUV RNA positive bank voles captured in the Ardennes region of France in 2011 31 we first observed that, at comparable viral RNA titer (Ct value), lung was more performing for hantavirus investigation on the chip (data not shown). Therefore, lung derived RNAs were priorized for further investigation.
Genetic characterization of Puumala viruses circulating in Europe. RNA extracts from supernatant of Vero E6 cells infected with the PUUV Sotkamo strain, and RNA extracts from lungs of seven PUUV-infected bank voles originating from France, Germany, Finland or Slovenia were individually hybridized to the 22 PUUV sequences (seq222 to seq243) tiled on the PathogenID v3.0 resequencing microarray. It is of note that only two of the tested samples had their exact sequence tiled on the chip: the PUUV Sotkamo strain (seq225) and the French Ardennes PUUV strain 87 (seq237). When a significant signal was detected, the corresponding raw sequence was subjected to BLASTN enquiry for genetic typing (Table 3). Figure 3 pictures the results in the context of a phylogenetic tree illustrating the currently known diversity of PUUV by combining references sequences available in GenBank, sequences tiled on the chip (in red) and sequences of the tested viruses (in green). An unequivocal determination of the correct geographical variant was observed for all the tested samples at least with one tiled sequence (red dots in Fig. 3), even when the corresponding sequence was not tiled itself on the chip. In very few cases (<8%, orange dots in Fig. 3) tiled sequences designated only an approximate origin, however always in the same genetic cluster.
The French Ardennes PUUV lung-derived variants 87 and 153 were precisely identified by BLASTN not only using the very homologous French Ardennes PUUV tiled sequences (seq237-seq241), but also with tiled sequences from Belgium (seq222-seq223, seq230) or even from North-West Germany (seq229 for variant 153) more distant phylogenetically within the Central European (CE) clade. With the same logic, the three PUUV variants from Germany were precisely characterized by 91% (10/11; Gilserberg), 84% (5/6; Weissach) and 50%  Table 3  www.nature.com/scientificreports www.nature.com/scientificreports/ (3/6; Bramsche) of the hybridizing homologous and heterologous tiled sequences from the Central European clade (Table 3). More interestingly, the Konnevesi variant from Finland was exactly identified not only with tiled sequences from its specific clade (Finland, FIN) but also from the Central European (CE) clade. Equally, the variant 8098 from Slovenia (clade Alpes-Adrian, ALAD) was exactly identified with sequences from the ALAD clade, and also from the CE, FIN and Russia (RUS) clades. Finally, the laboratory PUUV Sotkamo strain was systematically identified by tiled sequences from almost all clades of the PUUV phylogenetic tree.
It is of note that these very precise characterisations could be obtained despite very low CR values, for example down to only 13% of defined nucleotides between the tested PUUV variant Slovenia 8098 and the tiled seq226 (Kazan_Z84204) ( Table 3). It is explained by the exact determination of short fragments (stretches) of significant sequence (e.g. minimum of 15 nucleotides) that allow precise BLASTN identification despite poor CR values (Table 3; Supplement 5).
From Puumala virus to other orthohantaviruses circulating in Europe. The potential in detection and genetic characterization of the resequencing chip Pathogen ID v3.0 was further evaluated for variants and laboratory strains of other hantavirus species circulating in Europe, namely DOBV, TULV, SEOV and TOPV. For this purpose, 30 additional hantavirus sequences (seq244 to seq273) were tiled on the microarray.
For DOBV, RNA extracts from two cell supernatants infected with reference laboratory strains Belgrade and Saaremaa and RNA extracts from two Yellow-necked mice from Slovenia were tested ( Table 4). All of them were exactly characterized: (1) by the DOBV (seq245-seq246) and/or SAAV (seq247) sequences tiled on the chip, as expected from the genetic similarity of both viruses; (2) more interestingly, by tiled sequences from more distant orthohantavirus species such as HTNV and Soochong virus. Significant results were also observed for TULV RNA extracts from one laboratory strain and three Common vole samples from Germany, Finland and Slovenia (Table 5). Following BLASTN, the correct sequence was either exclusively or at least dominantly characterized.
Here again, precise detection and identification was possible with tiled sequences from very distant clades of vole-associated New world orthohantaviruses such as Prospect Hill or Sin Nombre orthohantaviruses (Table 5). Finally the SEOV present in RNA extracts from Norway rats from France and UK, although unknown when the chip was designed, was perfectly identified by heterologous SEOV tiled sequences (Table 6). Similarly for RNA extracts from a TOPV laboratory strain, in absence of the corresponding sequence tiled on the chip, the exact characterization was achieved with PUUV tiled sequences (seq228, seq234, seq325) even at low CR of 15% (Table 7).

Discussion
Hantaviruses are zoonotic agents distributed world-wide. In Europe, rodent-borne hantaviruses are regularly provoking episodes of Hemorrhagic Fever with Renal Syndrome (HFRS). Tools to survey hantavirus circulation, geographical distribution and genetic features in the animal reservoir are essential to a better understanding and prevention of hantavirus infection in humans. Resequencing microarray has been shown powerful to precisely identify new genetic variants of emerging viruses [15][16][17][18][19][20][21] .
The present work represents a significant improvement of the resequencing microarray PathogenID developed through a collaborative study for detection and identification of orthohantaviruses circulating in Europe. The 1 st and 2 nd generations of PathogenID allowed to detect different viruses associated with hemorrhagic fevers, including hantaviruses; however they also showed their limits for the detection of PUUV 20 which is the most common and widespread European hantavirus causing a mild form of HFRS, Nephropatia Emidemica (NE) [2][3][4][5][6][7][8] . Therefore, we have switched the strategy for the design of the 3 rd generation PathogenID v3.0 used in this study. Tiled sequences did not target anymore the most conserved region of the genome, the L segment 24 , but the S  www.nature.com/scientificreports www.nature.com/scientificreports/ segment encoding the N protein, which present two advantages: it is more efficient to discriminate variants within a species; more sequences are present in GenBank. After a critical analysis of the taxonomy, 52 representative hantavirus sequences were selected among those available at the time of the conception of the chip ( Table 1). The resequencing methodology (Supplement 2) allowed to recognize both known viruses and previously unknown geographical variants.
Validation carried out by using hantavirus N protein encoding plasmids allowed to show a global tolerated divergence (up to 20%) between the tiled and the tested sequences for a correct identification of the PUUV prototype strain Sotkamo (Table 1, Table 2, Fig. 1; Supplement 3). A synergic effect in detection was observed when three viral species (PUUV; TULV; HTNV) were simultaneously tested (Supplement 4), most likely due their cross-contribution in hybridizing orthohantavirus conserved nucleotides, favoring identification by BLASTN analysis.
Identification of European hantaviruses, PUUV, DOBV, TULV, SEOV, TOPV present in tissue samples or in supernatants of Vero E6 cells infected with laboratory strains was demonstrated not only using the homologous sequences tiled on the chip, but also using phylogenetically distant sequences (Tables 3-7; Fig. 3). The key factor for the precise characterization of the tested hantavirus sequence was obviously its genetic distance from the tiled one (Fig. 3) outlining the importance of designing the chip from sequences encompassing the global diversity of hantaviruses. However, even for samples with a low Call Rate (CR: % of determined/total number of nucleotides following hybridization), precise taxonomical identification was possible when short specific fragments of significant sequence (about 15 nucleotides) were obtained for BLASTN analysis (Supplement 5). This was in particular the case for TOPV and PUUV Konnevesi variant in the present study (Tables 3-7). A similar observation was previously reported for hemorrhagic fever viruses detected by PathogenID v.2 20 . These short stretches of highly conserved sequences among hantavirus could serve for developing other hybridization methods such as hybrid captures 43 .
Geographical variants of PUUV, the most frequent hantavirus in Europe, was correctly determined in the different endemic areas, such as France, Germany, Finland, and Slovenia (Table 3, Fig. 3). SEOV isolate, recently pointed out to circulate both in France and in UK [39][40][41] , was also precisely characterized despite neither Lyon nor Cherwell isolates, respectively, were known when the microarray was designed (Table 6). Interestingly, both isolates were detected and correctly identified from two heterologous sequences tiled on the microarray, SEOV and HTNV (Table 6). In any cases, when a tiled sequence did not achieve the determination of the exact geographical variant, it was at least designing the phylogenetic clade and the number of sequences tiled on the microarray always allowed to reach the deepest level of precision (Fig. 3).
Altogether, the results obtained with DOBV, TULV, SEOV and TOPV clearly outline the potential of Pathogen ID v3.0 to largely explore the hantavirus genetic space and to deliver precise identification of the species and local variants present in the infected tissue or in the cell supernatant. The suitability of this approach was demonstrated to map the wide diversity of hantaviruses within the European continent, including new variants unknown at the moment of the design of the chip. Detection by resequencing microarray which is applicable to both animal and human samples, is of interest for both research and public health aspects. Our results are promising to enlarge evaluation to other hantaviruses from different continents, both the pathogenic ones circulating in other endemic areas, such as Americas where they provoke severe HCPS and also in other animal reservoirs such as insectivores and bats.   Table 3 for details on presented results.