Introduction

Hantaviruses are enveloped tri-segmented negative-stranded RNA viruses belonging to the Hantaviridae family, Bunyavirales order, according to the 2018 classification of the International Committee on Taxonomy of Viruses (ICTV)1. Hantaviruses are distributed worldwide, their animal reservoirs are rodents, insectivores or bats (orders Rodentia, Soricomorpha, Chiroptera). When rodent-borne orthohantaviruses are transmitted to humans through aerosolized excreta of infected rodent reservoirs, they can provoke Hemorrhagic Fever with Renal Syndrome (HFRS), its mild form Nephropathia Epidemica (NE), or Hantavirus Cardio-Pulmonary Syndrome (HCPS), mainly depending on the virus species and its genotype/strain. In Europe, cases of HFRS associated with Dobrava-Belgrade orthohantavirus (DOBV) and Seoul orthohantavirus (SEOV), and cases of NE associated with Puumala orthohantavirus (PUUV), are continuously reported in endemic areas such as the Balkans, the Fennoscandia region, Germany, Belgium, The Netherlands, France, and more recently, United Kingdom2,3,4,5,6,7,8. Tools to investigate the genetic diversity of hantaviruses in their animal reservoirs along their geographical distribution are particularly needed to better understand the epidemiology and predict hantavirus disease outbreaks in humans, and to set up appropriate public health measures9,10,11,12,13,14.

The recent development of molecular methods without an a priori hypothesis has brought a critical benefit in the field of diagnosis and research on infectious diseases. They can permit to obtain the sequence of a pathogen present in a biological sample, in absence of specific probes required for classic PCR or hybridization15,16,17,18,19,20,21,22,23. Among them, Next Generation Sequencing (NGS) have become widely used for identification and whole genome sequencing of novel animal or zoonotic pathogens and metagenome analysis22,23. These techniques are more accessible in terms of equipment and cost, but still require a complex downstream bioinformatics analysis which may represent a cumbersome step demanding a specific expertise. On the other hand, resequencing microarray do not demand a sophisticated analysis for interpretation. Indeed, the sequence obtained following hybridization of the amplified genetic material on the chip is used without any intermediate step for both BLASTN enquiry and phylogenetic analysis. Although the random amplification does not allow to specifically concentrate the viral material present in the specimen compared to classic or multiplex PCR assays, the following hybridization step on the resequencing microarray further improves the sensitivity and the specificity of the assay. In addition, the assay has the advantage of tolerating critical divergence levels, up to 20%, permitting a precise characterization from a short stretch of about 20 detected nt20, that can hardly be achieved when few reads are obtained following NGS assays. Resequencing microarray has been broadly used over the last ten years, for the detection and identification of emerging agents, such as monkeypox virus17,18, pandemic influenza viruses19, hemorrhagic fever viruses20, rhabdoviruses21. Several steps of the assay can be performed in the field (mobile laboratory) including sample preparation, random amplification by Phy29 (stable at room temperature) up to hybridization on the microarray. A specific equipment (fluidic station/scanner) is only required for the post-hybridization analysis and scanning.

PathogenID is a collaborative effort of teams of Institut Pasteur that combined expertise to develop 3 generations of resequencing microarrays for the detection of different emerging pathogens, bacteria and viruses, during public health urgencies and for research studies. The 1st and 2nd generations of the microarray contained only a limited number of viral sequences, 46 (PathogenID v1.0) and 126 (PathogenID v2.0), respectively. In particular, only sequences of six prototype orthohantaviruses, mainly associated with human diseases, were included: PUUV, DOBV, SEOV, Hantaan orthohantavirus (HTNV), Sin Nombre orthohantavirus (SNV), Andes orthohantavirus (ANDV)17,20. The targeted sequence corresponded to part of the large (L) segment coding for the RNA-dependent RNA-polymerase (RdRp), reputed to be the most conserved region of the genome24. While PathogenID v2.0 has been validated for detection of DOBV, SEOV, HTNV, ANDV and SNV in infected Vero E6 cells, it was not for PUUV, yet the most commonly circulating orthohantavirus in Europe20.

Therefore, the 3rd generation resequencing microarray (PathogenID v3.0), employed in the present cooperative study at the European scale has been specialized for virus detection and contains more than 800 viral sequences. This pan-viral resequencing microarray covers many viral pathogens critical for both animal and public health and in particular most of the known circulating species and variants of zoonotic viruses. Among those sequences, 52 were strategically chosen to cover the diversity of hantaviruses, in particular the most frequent rodent-borne species that are the only one reported today to have a zoonotic potential. The small (S) segment of the genome encoding the nucleocapsid (N) protein was chosen as target to design the probes since there are more efficient to discriminate variants within a species; in addition more reference sequences were present in GenBank at the moment of the pan-viral chip conception.

The present manuscript illustrates the use of the PathogenID v3.0 resequencing microarray to map the genetic diversity of several endemic hantaviruses mainly associated with human disease in Europe, such as PUUV, DOBV, including genotypes Dobrava and Saarema25, TULV, SEOV, Topografov hantavirus (TOPV) according to their geographical distribution.

Methods

Design of PathogenID v3.0 pan-viral resequencing microarray

The objective for the conception of the 3rd generation resequencing microarray (PathogenID v3.0) was to reach the widest coverage in virus diversity of both medical and veterinarian interest. Due to the technical limits of the microarray, we selected the minimum number of probes for each viral species/variant, in order to include the highest number of sequences. For each viral family, the included probes were chosen taking into account the reference sequences published in GenBank and our experience from the earlier generations of the microarray (PathogenID v1.0 and PathogenID v2.0)17,18,19,20,21.

PathogenID v3.0 contains 838 sequences including virus prototypes and their variants belonging to different families (complete list available upon request to the authors). Regarding hantaviruses, 52 N protein encoding S segment partial sequences were included (Table 1; Supplement 1), according to the known ICTV taxonomy in force when we designed the chip (e.g. choice of the probe sequences). The length of the tiled sequences was: i) 425 nucleotides (nt) for those available in GenBank, with the exception of seq234 (251 nt); ii) 303 nt for those sequenced in the laboratory during previous analyses and not published at the moment of the study (Table 1; Supplement 1).

Table 1 List of 52 hantavirus sequences fixed on the panviral chip PathogenID v3.0.

Upon sequence selection, PathogenID v3.0 was manufactured by Affymetrix (Santa Clara, California) according to their high density resequencing approach, based on the use of stepwise overlapping 25 nt long probes, the first covering position 1–25 of the tiled sequence, the second position 2–26, etc.: each probe comprises a set of 4 different alleles differing in the central 13th position (e.g. A, C, G or T)16,17. A total of 2.5 millions of 25 mer oligonucleotides were tiled on the microarray PathogenID v3.0.

Hantavirus plasmids

Synthetic plasmids containing the S segment of PUUV, the most endemic hantavirus in Europe (strain Sotkamo 2009), HTNV (strain 76/118), TULV (strain Moravia) were used in a preliminary assay to evaluate the potential of the microarray for hantavirus detection, prior testing the field animal samples. PUUV plasmid was taken as reference and tested alone and in pool with the HNTV and TULV plasmids.

Laboratory strains

Laboratory prototype strains of PUUV, DOBV (genotypes Dobrava and Saaremaa) TULV and TOPV isolated on Vero E6 cells in BSL3 containment were provided by the virology participant laboratory in Finland26,27,28,29,30 (Fig. 1).

Figure 1
figure 1

Cooperative work for genetic characterization study of endemic hantaviruses in Europe. Hantaviruses from rodents: Bank vole (Myodes glareolus), Common vole (Microtus arvalis), Yellow-necked mouse (Apodemus flavicolis), Striped field mouse (Apodemus agrarius), and Norway rat (Rattus norvegicus) captured in different endemic areas in Europe were genetically characterized using a resequencing pan-viral chip during a cooperative work among European research institutes: France, Germany, Scandinavia, Balkans, United Kingdom. Laboratory viral strains were also obtained from one of the participant laboratory. The tested RNA included: (i) RNA extracted from tissues (e.g. lung) originating from infected wild rodents; (ii) RNA extracted from supernatants from inoculated cells (Vero E6).

Endemic hantaviruses from wild rodents captured in Europe

Hantavirus rodent reservoirs of the Bank vole (Myodes glareolus), Common vole (Microtus arvalis), Yellow-necked mouse (Apodemus flavicolis), Striped field mouse (Apodemus agrarius), and Norway rat (Rattus norvegicus) were previously captured in different hantavirus endemic areas in Europe: France, Germany, Finland, Slovenia, and United Kingdom30,31,32,33,34,35,36,37,38,39,40,41. Figure 1 illustrates the capture areas and rodent species as well as the Institutes participating to the study and providing positive animals.

Lung samples were kept at −80 °C in RNAlater (Ambion, Thermo Fisher Scientific, Waltham, USA) prior to RNA extraction (QiaAmp Viral RNA Kit, Qiagen, Hilden, Germany). Hantavirus positive RNAs, routinely confirmed by the reference laboratories by RT-PCR30,31,32,33,34,35,36,37,38,39,40,41, were employed for exploring their genetic diversity by resequencing microarray.

France

PUUV, the most frequent hantavirus circulating in France, was obtained from bank voles samples captured in France in 2011 in the Ardennes region31, SEOV (Lyon strain) from Norway rats39.

Finland for Fennoscandia region

Lung tissues originating from PUUV-positive bank voles captured in Konnevesi in 2008 were used30.

Germany

PUUV and TULV RNAs originating from lungs of animal reservoirs captured across Germany32,33,34,35,36 were used.

Balkan region

PUUV, TULV, DOBV RNAs originating from lungs were obtained from rodents captured in Slovenia37,38.

United Kingdom

SEOV (Cherwell strain) obtained from organs of domestic rats in United Kingdom (UK) was used40,41.

All experiments were performed in accordance with relevant guidelines and regulations. RNAs extracted from tissue sample from wild animals were received from various Hantavirus Reference Centers in Europe (Fig. 1) through the EU program EVA (European Virus Archive - n° 228292) which facilitates access to virus/tissue library under MTA (Material Transfer Agreement). All of them have been previously published in peer-reviewed literature. All handling procedures of captured rodents followed the regulations of each respective country. The species studied are not protected and all efforts were made to minimize animal suffering.

Genetic detection and phylogeographical characterization of hantaviruses by the pan-viral chip PathogenID v3.0

The experimental procedure for hantavirus typing includes the following steps, detailed in previous works19,20 (Supplement 2).

RNA extraction

RNA was extracted using the QIAamp Viral RNA Extraction kit (Qiagen) from both animal organ homogenates or cell supernatants. cDNA was synthetized by Superscript III system (Invitrogen, Thermo Fisher).

Random amplification

Genetic material, either plasmid DNA, or cDNA from animal organs or cell supernatants, was amplified by WGA (Whole Genome Amplification) and WTA (Whole Transcriptome Amplification) approaches, respectively, using ϕ29 polymerase-mediated random amplification (Qiagen), followed by a ligation step17.

Hybridization on the microarray and sequence detection

Amplified products were hybridized overnight at 45 °C on the microarray PathogenID v3.0 after fragmentation and labeling using GeneChip Resequencing Reagent Kit (Affymetrix). Chips were then subjected to washing, fluorescence detection and scanning using the Affymetrix equipment (Wash Control, Scan Control). Resequencing analysis was performed using the software GSEQ. 4.1 (Affymetrix). For each of the 52 hantavirus sequences fixed on the chip (positions seq222 to seq273), an output sequence was obtained in a .txt format, with determined (A, G, T or C) or non-determined (N) positions (example in Supplement 2). Significant sequences obtained were used for Call Rate calculation and BLASTN analysis.

Call Rate calculation

Call Rate (CR) was calculated as the ratio (%) between the number of determined (‘called’) nucleotides (e.g. A, G, T, C) following hybridization and the total number of nucleotides for each tiled sequence (e.g. 425, 251 or 303 nt, for hantaviruses).

BLASTN analysis

For samples presenting a minimum stretch of informative sequence20, the entire raw sequence was submitted to BLASTN (Basic Local Alignment Search Tool Nucleotide) query [National Center for Biotechnology Information, National Institute of Health], to interrogate all the sequences present in GenBank. In case of positive BLASTN result, scores, identity (%), query coverage (%) and e-value were taken into account.

Phylogenetic analysis

Phylogenetic analysis was performed for PUUV (as representative of European orthohantavirus) by using (i) the reference sequences (S segment) available on GenBank, (ii) the sequences tiled on the chip, (iii) the sequences corresponding to the tested hantaviruses, when known.

Firstly, a phylogenetic tree was constructed with the complete coding part of reference sequences by the maximum-likehood method (ML) with PhyML v3.0, implemented in Seaview (v.4.6.1) under the most appropriate nucleotide substitution model as determined by SMS program (available online at http://www.atgc-montpellier.fr/sms/)42. Branch supports were evaluated by approximate likelihood-ratio test (aLRT SH-Like). Then, short sequences of the chips and of tested hantaviruses were placed in this backbone tree using RAxML available online on the CIPRES portal at (http://www.phylo.org). Branch supports of these phylogenetic placements were evaluated by the rapid bootstrap procedure with MRE-based Bootstopping criterion as highly recommended on the online software version.

For each sample giving a positive result following hybridization, BLASTN analysis of the resulting sequence, was pointed out on the phylogenetic tree, and compared to the sequences having permitted the detection and genetic characterization.

Results

Initial validation of the 3rd generation pan-viral resequencing microarray for hantavirus detection and genetic characterization

The performance in hantavirus detection and genetic characterization of the resequencing chip Pathogen ID v3.0 was first evaluated using plasmids encoding prototype hantavirus sequences (PUUV, HTNV, TULV). The PUUV plasmid encompassing the N protein coding region (1831 nt) of the reference Sotkamo strain 2009 was hybridized to Pathogen ID v3.0 which includes 22 PUUV S segment sequences (seq222 to seq243) (Table 1; Supplement 1). For each of these sequences, Table 2 summarizes (i) the calculated sequence similarity (%) with the tested PUUV Sotkamo strain sequence; (ii) the percentage of correctly identified nucleotides (CR) following its hybridization to the chip and (iii) the result of the BLASTN analysis performed with the obtained raw sequences (Supplement 3).

Table 2 Validation of the Pathogen ID v3.0 resequencing microarray using a nucleocapsid (N) protein encoding plasmid of the PUUV prototype strain Sotkamo.

It clearly appeared that the CR values were proportional to the sequence similarity between the tested sequence and the tiled ones (Table 2). From 100% (seq225, Sotkamo itself) to 91.5% (seq231) of identity, the CR was very high (97.7% to 75%, resp.) and the BLASTN identified the Sotkamo strain without ambiguity. Down to 82.2% of sequence identity (seq230), the CR remained above 34%, still designing Sotkamo in priority by BLASTN, although some tiled sequences already hesitated in precise identification (seq232) or even failed in identification (seq226). Down to 82% of homology, the CR decreased dramatically and the tested sample failed to be identified in some cases. Plotting of the sequence identity to the CR confirmed these observations (Fig. 2): below 80.7% of identity between the tested and the tiled sequences the microarray becomes inefficient for specific detection; the window between 80.7% and 83.2% of identity is critical and versatile between no detection, detection with unspecific determination, and precise genetic characterization of the tested sequence; higher than 83.2% of identity, the microarray identifies precisely the tiled sequence.

Figure 2
figure 2

Performance of the Pathogen ID v3.0 resequencing microarray to detect and identify hantavirus sequences. A plasmid encoding the nucleocapsid (N) protein of the PUUV prototype strain Sotkamo was used to evaluate the range of detection of the Pathogen ID v3.0 resequencing microarray. This figure compares the calculated sequence identity (%) between the tested sequence (PUUV Sotkamo) and each PUUV sequences tiled on the microarray with the respective Call Rate (CR, i.e. % of determined nucleotides) obtained after hybridization. Colours, values and arrows outline a window of sequence identity (%) for BLASTN results obtained from each output sequence following hybridization: from no detection/identification (blue), to general PUUV characterization (Sotkamo + others, orange) to precise and exclusive characterization (Sotkamo, red). Complementary information is described in Table 2 and Supplements 34.

We verified that simultaneous detection was possible when plasmids containing the N protein coding region of the three hantavirus reference strains (PUUV Sotkamo 2009, TULV Moravia, HTNV 76/118) were mixed in pool (Supplement 34). The CR values of PUUV output sequences (seq222–seq243) were even higher (e.g. more determined nucleotides) for the pool of the three viruses, most likely due to cross-contribution from the three viruses in hybridizing the same sequence tiled on the chip, thus producing better results in BLASTN analysis (Supplements 34).

Mapping genetic diversity of hantaviruses circulating in Europe

The same technology validated with plasmids encoding hantavirus N protein was applied to tissue samples of PUUV, TULV, DOBV, SAAV, SEOV, TOPV infected rodents originating from different endemic areas in Europe as well as to supernatants of cells infected with laboratory strains (Tables 37; Supplement 5). Using RNA extracted from lung and liver of PUUV RNA positive bank voles captured in the Ardennes region of France in 201131 we first observed that, at comparable viral RNA titer (Ct value), lung was more performing for hantavirus investigation on the chip (data not shown). Therefore, lung derived RNAs were priorized for further investigation.

Table 3 Detection and genetic characterization of Puumala orthohantaviruses (PUUV) from various endemic area in Europe: PUUV isolates or laboratory strain were hybridized to the hantavirus sequences tiled on the PathogenID v3.0 resequencing microarray (seq222 to seq273; Table 1; Supplement 1). For each tested sample, the tiled sequences giving a significant signal are listed and the Call Rate (CR, i.e. % of correctly determined nucleotides) indicated. The corresponding raw sequences were subjected to BLASTN enquiry. The number (nbr) of BLASTN identical results is indicated; the most related sequence is indicated in italics and a maximum of the first 3 closer sequences are listed. “Total Score” evaluates the overall quality of the alignment by BLASTN. Complementary data including all sequences obtained following BLASTN analysis and further details are described in Supplement 5.
Table 4 Detection and genetic characterization of Dobrava orthohantaviruses (DOBV) from various endemic area in Europe: DOBV isolates or laboratory strains were hybridized to the hantavirus sequences tiled on the PathogenID v3.0 resequencing microarray (seq222 to seq273; Table 1; Supplement 1).
Table 5 Detection and genetic characterization of Tula orthohantaviruses (TULV) from various endemic area in Europe: TULV isolates or laboratory strain were hybridized to the hantavirus sequences tiled on the PathogenID v3.0 resequencing microarray (seq222 to seq273; Table 1; Supplement 1).
Table 6 Detection and genetic characterization of Seoul orthohantaviruses (SEOV) from various endemic area in Europe: SEOV isolates were hybridized to the hantavirus sequences tiled on the PathogenID v3.0 resequencing microarray (seq222 to seq273; Table 1; Supplement 1).
Table 7 Detection and genetic characterization of Topografov orthohantavirus (TOPV): the laboratory strain Topografov was hybridized to the hantavirus sequences tiled on the PathogenID v3.0 resequencing microarray (seq222 to seq273; Table 1; Supplement 1).

Genetic characterization of Puumala viruses circulating in Europe

RNA extracts from supernatant of Vero E6 cells infected with the PUUV Sotkamo strain, and RNA extracts from lungs of seven PUUV-infected bank voles originating from France, Germany, Finland or Slovenia were individually hybridized to the 22 PUUV sequences (seq222 to seq243) tiled on the PathogenID v3.0 resequencing microarray. It is of note that only two of the tested samples had their exact sequence tiled on the chip: the PUUV Sotkamo strain (seq225) and the French Ardennes PUUV strain 87 (seq237). When a significant signal was detected, the corresponding raw sequence was subjected to BLASTN enquiry for genetic typing (Table 3). Figure 3 pictures the results in the context of a phylogenetic tree illustrating the currently known diversity of PUUV by combining references sequences available in GenBank, sequences tiled on the chip (in red) and sequences of the tested viruses (in green). An unequivocal determination of the correct geographical variant was observed for all the tested samples at least with one tiled sequence (red dots in Fig. 3), even when the corresponding sequence was not tiled itself on the chip. In very few cases (<8%, orange dots in Fig. 3) tiled sequences designated only an approximate origin, however always in the same genetic cluster.

Figure 3
figure 3

Phylogenetic analysis of European Puumala orthohantavirus (PUUV) genetically characterised by PathogenID v3.0 resequencing microarray. A phylogenetic tree backbone was first constructed from complete S segment coding sequences of reference hantaviruses available in GenBank using the maximum-likelihood method (ML) with PhyML v3.0, implemented in Seaview (v.4.6.1) with the most appropriate substitution model as determined by SMS program. Shorter sequences (tiled on the chip & tested viruses) were then placed in the backbone tree using RaxML. Nodes with branch support values > 0.8 are indicated by a red point (reference tree, aLRT branch support test) or by a blue point (phylogenetic placement of short sequence in the reference tree, rapid bootstrap procedure with MRE-based Bootstopping criterion). Scale bar represents the average number of substitutions per site. Under the tree are indicated the main geographical clusters of PUUV in Europe: CE (Central Europe); ALAD (Alpes-Adrian); S-SCA (South Scandinavia); FIN (Finland); RUS (Russia); LAT (Latvia); N-SCA (North Scandinavia); DEN (Denmark). The 22 PUUV sequences fixed on the chip (seq222 to seq243) are outlined in red, those of the 8 tested viruses in green. In the upper grid, the 8 tested Puumala viruses (the laboratory PUUV Sotkamo strain and 7 European geographical variants) are presented with: their exact position in the tree (green diamond); the tiled sequences having permitted their detection (significant CR) with exact (red circles) or close (orange circles) genetic identification by BLASTN. Complementary information is described in Table 3 and Supplement 5.

The French Ardennes PUUV lung-derived variants 87 and 153 were precisely identified by BLASTN not only using the very homologous French Ardennes PUUV tiled sequences (seq237-seq241), but also with tiled sequences from Belgium (seq222-seq223, seq230) or even from North-West Germany (seq229 for variant 153) more distant phylogenetically within the Central European (CE) clade. With the same logic, the three PUUV variants from Germany were precisely characterized by 91% (10/11; Gilserberg), 84% (5/6; Weissach) and 50% (3/6; Bramsche) of the hybridizing homologous and heterologous tiled sequences from the Central European clade (Table 3). More interestingly, the Konnevesi variant from Finland was exactly identified not only with tiled sequences from its specific clade (Finland, FIN) but also from the Central European (CE) clade. Equally, the variant 8098 from Slovenia (clade Alpes-Adrian, ALAD) was exactly identified with sequences from the ALAD clade, and also from the CE, FIN and Russia (RUS) clades. Finally, the laboratory PUUV Sotkamo strain was systematically identified by tiled sequences from almost all clades of the PUUV phylogenetic tree.

It is of note that these very precise characterisations could be obtained despite very low CR values, for example down to only 13% of defined nucleotides between the tested PUUV variant Slovenia 8098 and the tiled seq226 (Kazan_Z84204) (Table 3). It is explained by the exact determination of short fragments (stretches) of significant sequence (e.g. minimum of 15 nucleotides) that allow precise BLASTN identification despite poor CR values (Table 3; Supplement 5).

From Puumala virus to other orthohantaviruses circulating in Europe

The potential in detection and genetic characterization of the resequencing chip Pathogen ID v3.0 was further evaluated for variants and laboratory strains of other hantavirus species circulating in Europe, namely DOBV, TULV, SEOV and TOPV. For this purpose, 30 additional hantavirus sequences (seq244 to seq273) were tiled on the microarray.

For DOBV, RNA extracts from two cell supernatants infected with reference laboratory strains Belgrade and Saaremaa and RNA extracts from two Yellow-necked mice from Slovenia were tested (Table 4). All of them were exactly characterized: (1) by the DOBV (seq245-seq246) and/or SAAV (seq247) sequences tiled on the chip, as expected from the genetic similarity of both viruses; (2) more interestingly, by tiled sequences from more distant orthohantavirus species such as HTNV and Soochong virus. Significant results were also observed for TULV RNA extracts from one laboratory strain and three Common vole samples from Germany, Finland and Slovenia (Table 5). Following BLASTN, the correct sequence was either exclusively or at least dominantly characterized. Here again, precise detection and identification was possible with tiled sequences from very distant clades of vole-associated New world orthohantaviruses such as Prospect Hill or Sin Nombre orthohantaviruses (Table 5). Finally the SEOV present in RNA extracts from Norway rats from France and UK, although unknown when the chip was designed, was perfectly identified by heterologous SEOV tiled sequences (Table 6). Similarly for RNA extracts from a TOPV laboratory strain, in absence of the corresponding sequence tiled on the chip, the exact characterization was achieved with PUUV tiled sequences (seq228, seq234, seq325) even at low CR of 15% (Table 7).

Discussion

Hantaviruses are zoonotic agents distributed world-wide. In Europe, rodent-borne hantaviruses are regularly provoking episodes of Hemorrhagic Fever with Renal Syndrome (HFRS). Tools to survey hantavirus circulation, geographical distribution and genetic features in the animal reservoir are essential to a better understanding and prevention of hantavirus infection in humans. Resequencing microarray has been shown powerful to precisely identify new genetic variants of emerging viruses15,16,17,18,19,20,21.

The present work represents a significant improvement of the resequencing microarray PathogenID developed through a collaborative study for detection and identification of orthohantaviruses circulating in Europe. The 1st and 2nd generations of PathogenID allowed to detect different viruses associated with hemorrhagic fevers, including hantaviruses; however they also showed their limits for the detection of PUUV20 which is the most common and widespread European hantavirus causing a mild form of HFRS, Nephropatia Emidemica (NE)2,3,4,5,6,7,8. Therefore, we have switched the strategy for the design of the 3rd generation PathogenID v3.0 used in this study. Tiled sequences did not target anymore the most conserved region of the genome, the L segment24, but the S segment encoding the N protein, which present two advantages: it is more efficient to discriminate variants within a species; more sequences are present in GenBank. After a critical analysis of the taxonomy, 52 representative hantavirus sequences were selected among those available at the time of the conception of the chip (Table 1). The resequencing methodology (Supplement 2) allowed to recognize both known viruses and previously unknown geographical variants.

Validation carried out by using hantavirus N protein encoding plasmids allowed to show a global tolerated divergence (up to 20%) between the tiled and the tested sequences for a correct identification of the PUUV prototype strain Sotkamo (Table 1, Table 2, Fig. 1; Supplement 3). A synergic effect in detection was observed when three viral species (PUUV; TULV; HTNV) were simultaneously tested (Supplement 4), most likely due their cross-contribution in hybridizing orthohantavirus conserved nucleotides, favoring identification by BLASTN analysis.

Identification of European hantaviruses, PUUV, DOBV, TULV, SEOV, TOPV present in tissue samples or in supernatants of Vero E6 cells infected with laboratory strains was demonstrated not only using the homologous sequences tiled on the chip, but also using phylogenetically distant sequences (Tables 37; Fig. 3). The key factor for the precise characterization of the tested hantavirus sequence was obviously its genetic distance from the tiled one (Fig. 3) outlining the importance of designing the chip from sequences encompassing the global diversity of hantaviruses. However, even for samples with a low Call Rate (CR: % of determined/total number of nucleotides following hybridization), precise taxonomical identification was possible when short specific fragments of significant sequence (about 15 nucleotides) were obtained for BLASTN analysis (Supplement 5). This was in particular the case for TOPV and PUUV Konnevesi variant in the present study (Tables 37). A similar observation was previously reported for hemorrhagic fever viruses detected by PathogenID v.220. These short stretches of highly conserved sequences among hantavirus could serve for developing other hybridization methods such as hybrid captures43.

Geographical variants of PUUV, the most frequent hantavirus in Europe, was correctly determined in the different endemic areas, such as France, Germany, Finland, and Slovenia (Table 3, Fig. 3). SEOV isolate, recently pointed out to circulate both in France and in UK39,40,41, was also precisely characterized despite neither Lyon nor Cherwell isolates, respectively, were known when the microarray was designed (Table 6). Interestingly, both isolates were detected and correctly identified from two heterologous sequences tiled on the microarray, SEOV and HTNV (Table 6). In any cases, when a tiled sequence did not achieve the determination of the exact geographical variant, it was at least designing the phylogenetic clade and the number of sequences tiled on the microarray always allowed to reach the deepest level of precision (Fig. 3).

Altogether, the results obtained with DOBV, TULV, SEOV and TOPV clearly outline the potential of Pathogen ID v3.0 to largely explore the hantavirus genetic space and to deliver precise identification of the species and local variants present in the infected tissue or in the cell supernatant. The suitability of this approach was demonstrated to map the wide diversity of hantaviruses within the European continent, including new variants unknown at the moment of the design of the chip. Detection by resequencing microarray which is applicable to both animal and human samples, is of interest for both research and public health aspects. Our results are promising to enlarge evaluation to other hantaviruses from different continents, both the pathogenic ones circulating in other endemic areas, such as Americas where they provoke severe HCPS and also in other animal reservoirs such as insectivores and bats.