Recombination at the emergence of the pathogenic rabbit haemorrhagic disease virus Lagovirus europaeus/GI.2

Rabbit haemorrhagic disease is a viral disease that emerged in the 1980s and causes high mortality and morbidity in the European rabbit (Oryctolagus cuniculus). In 2010, a new genotype of the rabbit haemorrhagic disease virus emerged and replaced the former circulating Lagovirus europaeus/GI.1 strains. Several recombination events have been reported for the new genotype Lagovirus europaeus/GI.2, with pathogenic (variants GI.1a and GI.1b) and benign (genotype GI.4) strains that served as donors for the non-structural part while GI.2 composed the structural part; another recombination event has also been described at the p16/p23 junction involving GI.4 strains. In this study, we analysed new complete coding sequences of four benign GI.3 strains and four GI.2 strains. Phylogenetic and recombination detection analyses revealed that the first GI.2 strains, considered as non-recombinant, resulted from a recombination event between GI.3 and GI.2, with GI.3 as the major donor for the non-structural part and GI.2 for the structural part. Our results indicate that recombination contributed to the emergence, persistence and dissemination of GI.2 as a pathogenic form and that all described GI.2 strains so far are the product of recombination. This highlights the need to study full-genomic sequences of lagoviruses to understand their emergence and evolution.

The genomes of the earliest GI.2 strains 10-28 and 10-32 were 7,448 nucleotides (nt) long and shared 94.6% nucleotide identity. The length of the 5′ UTR sequences is 9 nt and that of the 3′ UTR is 70 nt. The closest strains were GI.3 strains with an average nucleotide identity of 93.8% (94.3% for the non-structural part of the genomes), while the average identity with GI.1 strains was 85% (86.4% for the non-structural part of the genomes). The genomes of the GI.3 strains were 7,427 nt long (excluding the 5′ UTR), except the CHA20/09-100 strain that was 7,424 nt long. The 3′ UTR sequences had the same length (64 nt). When including the GI.3 06-11 sequence (GenBank accession number MN737115), the average nucleotide identity between the five GI.3 strains was 97.1%. The CDS of the four GI.2 showed between 93.2% and 94.5% of nucleotide identity (93.2% between 16-35 and 16-36 strains), and when compared to GI.1 CDS presented no deletions or insertions (7,369 nt long), while the GI.3 CDS presented a deletion of six consecutive nucleotides within the capsid gene (7,363 nt long instead of 7,369 nt). This deletion is present in the benign strains 06-11, Ashington and the Italian RCV (GenBank accession numbers: EF558587 and X96868) 2, 4, 7 and comprises a codon putatively under positive selection in GI.1 strains 31 . The strain CHA20/09-100 had a further three nucleotide deletion in the minor structural protein VP10 (positions 193-195, amino acid 65).
The phylogenetic analysis was performed taking into account the location of the recombination breakpoint detected by RDP at the junction of the non-structural and structural parts. In the ML tree of the structural part, which includes VP60 and VP10 (Fig. 2i), the GI.3 strains formed a strongly supported group (bootstrap value 100), that was a sister group to GI.1 (bootstrap value 100). The MRCV strain, a GI.4-like/GI.3-like recombinant 33 , also grouped with the GI.3 strains (bootstrap value 77). According to the nomenclature recently proposed for lagoviruses, MRCV does not meet the criteria to be considered as a member of the GI.3 group and is currently unclassified 9 . Indeed, the genetic distance between MRCV and GI.3 VP60 gene sequences is > 15% (data not shown) and no other similar strains (from independent outbreaks) had been identified. As for the four GI.2 sequences obtained in this study, they clustered within a highly supported group (bootstrap value 100) formed  25 , GI.2 strains previously considered as non-recombinants (marked with an *), and the 2016 strains from Canada (Canada2016) and the Netherlands (NL2016).
The ML tree for the non-structural protein p16 (see supplementary information), recapitulated most of the results observed in the ML tree for the non-structural part with the exception of the positioning of the Portuguese strains SOS137, SOS164, SOS148, SOS149, SOS151, SOS173, SOS404 and SOS468, that appear clustered with GI.4. Strains SOS137 and SOS164 were previously shown to be triple recombinants between GI.4, GI.1b and GI.2 (GI.4 for p16, GI.1b for the remaining non-structural proteins and GI.2 for the structural proteins) 25 which is in agreement with the results obtained in this study. As for strains SOS148, SOS149, SOS151, SOS173, SOS404 and SOS468, the tree suggests that their genome also originated from three different strains: GI.4 for p16, GI.3 for the remaining non-structural proteins and GI.2 for the structural proteins. In this case, it most likely involved recombination between a GI.4 strain and a GI.3/GI.2 recombinant strain. A tanglegram analysis and RDP further confirmed these results (see supplementary information).

Discussion
GI.2 was identified as a novel pathogenic form of lagovirus in France in 2010 8 . Field observations and further characterisation of the strains revealed unique characteristics in comparison with former strains such as the ability to cross the species boundaries [13][14][15][16][17] and to kill young rabbits 11,12 . Previous studies also revealed the occurrence of recombination of GI.2 strains with non-pathogenic (GI.4) and pathogenic (GI.1a and GI.1b) strains 24-26 , showing an important role of recombination in generating diversity in GI.2 and confirming the high capacity of recombination within lagoviruses.
Our results show that the strains circulating at the time of the first noticed GI.2 outbreaks were already recombinants between a European non-pathogenic strain (GI.3), that was donor for the non-structural part, and the new lagovirus genotype, GI.2, that formed the structural part. Indeed, in our ML tree for the non-structural part, the strains from the first GI.2 outbreaks (10-28 and 10-32) grouped with the non-pathogenic GI.3 group, while for the structural part these strains clustered with all the strains of the newly emerged GI.2 genotype. Strains from the earliest outbreaks in Spain and Portugal in 2011-2012 24 , and strains collected in the Canary Islands (Spain) 35 , France, the Netherlands 34 and Canada in 2016 are also GI.3/GI.2 recombinants. These results, that were confirmed by the RDP, the ML trees and the tanglegram analyses, demonstrate that all the GI.2 strains described so far, including those that were considered non-recombinants and that in some instances co-circulated with other recombinant GI.2 strains 24 , resulted from recombination. Several recombination events are thus associated with the evolution of this new genotype: with GI.3, GI.4, GI.1b and GI.1a, with a recombination breakpoint at the non-structural/structural boundary 24,26 . An additional recombination event was further previously reported with a breakpoint at the p16/p23 boundary and involved GI.4 as the donor of p16, while GI.3/GI.2 (see also supplementary information) or GI.1b/GI.2 recombinants were the donors for the remaining viral genome 25 .
Notably, the pattern described here for lagoviruses, with a recombination hotspot at the start of the major structural gene that encodes the capsid, is typical of caliciviruses [36][37][38] . Indeed, the combination of low sequence divergence 39 , presence of complex RNA secondary structures 39 that may promote template switching by the virus polymerase, and the existence of a subgenomic RNA 40 that may act as a secondary template when RNA replication

16-35
CBCoruche14-2_Portugal_2014 www.nature.com/scientificreports/ is resumed upon template switching, seem to contribute for the high rates of recombination observed in this region in the genome of caliciviruses and that likely constitutes an important survival strategy in the evolution of this family 41 . This strategy, that resembles antigenic shift in Influenza virus 41 , allows caliciviruses to rapidly generate diversity by producing new genomic combinations, which might be beneficial for the adaptation to new hosts and environments, and to overcome selective pressures. Recombination is important in shaping the evolution of RNA viruses, including the closely related picornaviruses 42 and coronaviruses 43 and, more importantly, is often associated with the emergence of new viruses 44 and even families, e.g. the Hepeviridae family 45 . As for the recombination involving the breakpoint at the p16/p23 boundary, it may have originated from a non-replicative recombination mechanism where RNA strands are randomly self-ligated or joined by cellular ligases 46 . Atypical recombination breakpoints such as this have been also observed for other caliciviruses and a similar mechanism proposed 47 . Estimation of the time to the most recent common ancestor (tMRCA) of GI.2 by using capsid sequences pointed to an emergence in July 2008 25 . Although the estimated substitution rates may be inaccurate due to limited sampling, reduced sequence variation and low temporal spread, they tend to underestimate rather than overestimate the real tMRCA 48,49 . Thus, it is possible that despite surveillance efforts, GI.2 circulated unnoticed in wildlife prior to its detection 50 . The virus that recombined with GI.3 to produce this new genotype is currently unknown and undetected, even with the molecular surveys of lagoviruses performed in European leporids and our improved knowledge of the complete coding sequences of both pathogenic and benign lagoviruses. The same occurs for some older recombinant Iberian strains where the virus that originated the non-structural part has never been detected 51 . Both viruses, the one originating GI.2 and the one of these Iberian strains, could have either circulated harmlessly prior to their detection, thus making difficult to detect them, or have become extinct due to their lower fitness as a non-recombinant form, as suggested for noroviruses that recombined and whose partial sequences were never found 52 .
The evidence for recombination with GI.1a 26 , GI.1b 24 , GI.3 and GI.4 24,25 (see also supplementary information) reveals that GI.2 successfully recombines with a great diversity of pathogenic and non-pathogenic lagoviruses. This, together with the tMRCA estimates and the hypothesis of circulating harmlessly before its emergence, supports an evolution from a non-pathogenic form that acquired pathogenicity 28 . We suggest that evolution of www.nature.com/scientificreports/ pathogenicity was not driven solely by point mutations, but was aided by recombination events. Similarly to other RNA viruses, lagoviruses genomes may function as interchangeable modules rather than as strict genomes, which leads to the appearance of mosaic-like genomes through recombination, resulting in a semi-independent evolution of structural and non-structural genes 53,54 and with low-fitness regions being eliminated by recombination 55 . This might help to explain how a non-pathogenic strain (GI.3), combined with a potentially benign strain (GI.2), gave rise to a highly lethal, pathogenic strain. Non-pathogenic forms are thus likely to be involved in the evolution of pathogenic forms and recombination might have precipitated the emergence of Lagovirus europaeus/ GI.2. The hypothesis of a species jump may not be discarded. Indeed, reservoir species could have acted as source for the original virus that jumped to the final host, the European rabbit 28 . Whether recombination occurred in the reservoir species or in the final host is unknown. Nonetheless, reservoir species have yet to be identified in which lagoviruses could have evolved and replicated without being detected before "jumping" into the European rabbit (Le Gall-Reculé, personal communication) [56][57][58][59] . Genetic mechanisms such as recombination require coinfection of a host cell by two (or more) viruses. The high frequency of recombination detected in lagoviruses also implies a high frequency of co-infection, either in the host population or in the reservoir species. Leeks and colleagues established a link between co-infection and viral diversity, as more diverse populations can produce more virulent infections and better adapt to new hosts 60 . This might explain the array of hare species infected by GI.2 [13][14][15][16][17] , which constitutes a novelty in relation to GI.1.
Our results add complexity to the diversity of lagoviruses, but help to elucidate our current understanding on the emergence of pathogenic forms. In addition, they show that full genomic sequences of lagoviruses are crucial to understand their evolutionary history and genetic relationships. Future studies should focus on unravelling the role of the structural and the non-structural parts in the emergence of pathogenicity in lagoviruses.

Methods
Virus samples and full-length genome sequencing. No animals were captured, handled or killed specifically for the purpose of this study. Duodenum samples were collected from four hunted French rabbits by the National Hunting and Wildlife Agency (ONCFS) during the hunting seasons in 2007-2008 and 2008-2009. Presence of lagoviruses had previously been detected in these samples and phylogenetic relationships of the obtained capsid VP60 gene sequences had been determined 9 ; these were identified as GI.3 strains. Four liver samples were further collected in France from dead rabbits collected in three rabbitries affected by RHD, two on November 2010 (10-28 and 10-32 GI.2 strains 11 ) and one in June 2016 (strain [16][17][18][19][20][21][22][23][24][25][26][27][28][29][30][31][32][33][34][35], and in the field on June 2016 (strain . The presence of GI.2 in the 16-35 sample was diagnosed by Labovet Conseil (Les Herbiers, France). We characterised this strain as well as the 16-36 one based on their complete VP60 gene sequences, confirming that they belong to the GI.2 genotype.
For the sequencing of the complete coding genome of GI.3 strains, ~ 30 mg of duodenal tissue were homogenized in a mixer-mill disruptor (TissueLysed, Qiagen, Hidden, Germany). RNAs were first extracted with TRIzol ® (TRIzol LS Reagent, Ambion, Thermo Fisher Scientific, Villebon-sur-Yvette, France) before using the RNeasy Mini kit (Qiagen, Hilden, Germany) to increase sensitivity 2 . For GI.2 strains, RNA was extracted from 100µL of liver exudate using the RNeasy Mini kit. Except for two GI.2 strains (16-35 and 16-36) for which the coding sequence was obtained using NGS (see below) from purified and quantified RNA using the RNeasy MinElute cleanup kit (Qiagen, Hilden, Germany) and the Qubit RNA HS Assay Kit with the Qubit 2.0 Fluorometer (Thermo Fisher Scientific, Villebon-sur-Yvette, France), respectively, RNAs were reverse-transcribed using oligo-dT as a primer and Maxima Reverse Transcriptase (Thermo Fisher Scientific, Villebon-sur-Yvette, France). Protocols were performed as recommended by the manufacturers.
According to the strains, different PCR strategies were attempted for genome sequencing either by using combinations of newly designed primers (primer sequences available upon request) or primers previously published 3,11,14 . Thus, cDNAs of JA10/08-10, BO25/08-133 and CHA20/09-100 GI.3 strains were amplified using several overlapping PCRs, including one that covered the recombinant breakpoint between the non-structural and structural encoding genes, using Expand High-fidelity PCR System (Roche, Sigma-Aldrich, Saint-Quentin-Fallavier, France). For JA34/09-48 GI.3 strain, due to lower viral loads, after two first PCRs that amplified two overlapping fragments of about 5,700 and 6,700 bp, respectively, several nested or semi-nested overlapping PCRs were performed. The complete coding sequences of the four GI.2 genomes were obtained following either one PCR amplification of about 7,300 bp using the primers 1U 14 and 15L 11 followed by smaller overlapping PCRs for sequence confirmation (10-28 and 10-32; primer sequences available upon request), or NGS (16-35 and 16-36, see below). For these two last strains, the 5′ end sequence of the CDS (530 bp) was confirmed using the PCR primers 1U and 1L as described in Le Gall-Reculé et al. (2017) 14 . PCRs were performed using Expand High Fidelity enzyme (Roche-Applied-Science). The different PCR products were analysed by electrophoresis, purified using the MinElute PCR Purification kit (Qiagen, Hilden, Germany) and quantified using the Qubit dsDNA HS Assay Kit with the Qubit 2.0 Fluorometer (Thermo Fisher Scientific, Villebon-sur-Yvette, France). DNA sequences were determined with an ABI Prism 3100 Genetic or a 3500 Series Genetic Analysers in both directions (Applied Biosystems, Foster City, CA, USA), using the PCR and several inner primers and Big Dye Terminator v3.1 (Life Technologies) as recommended by the manufacturer. The 5′ and 3′ ends of the 10-28 and 10-32 strains were obtained using the rapid amplification of cDNA ends (RACE) method following the protocol developed in Lemaitre et al. (2018) 30 . PCR products were purified and sequenced as described above. Consensus sequences were compiled using Vector NTI Advance software (Life Technologies, Thermo Fisher Scientific, Villebon-sur-Yvette, France).
For NGS, cDNA libraries were prepared using the Ion Total RNA-Seq Kit (Life Technologies, Carlsbad, CA, USA) according to a protocol adapted from the supplier's instructions (available upon request from the authors). Sequencing was performed using Ion Proton Sequencer (Life Technologies). The reads were cleaned with the The aligned reads of this third alignment were collected and then down-sampled to fit a global coverage depth estimation of 80x. These cleaned reads were submitted to the SPAdes 64 (version 3.10.0) de novo assembler and their raw related reads to the Mira 65 (version 4.0.2) de novo assembler. The de novo contigs were then submitted to MegaBLAST 66 on a local copy of nucleotide databank. The best lagovirus reference identified with MegaBLAST was used in a last BWA alignment of cleaned reads. The genomic sequences produced in this study have been deposited in GenBank under the accession numbers: MN746288, MN746289, MN737116, MN737117 (GI.3 strains JA10/08-10, BO25/08-133, JA34/09-48 and CHA20/09-100, respectively); MN737113, MN737114, MN738377, MN786321 (GI.2 strains 10-28, 10-32, 16-35 and 16-36, respectively).

Recombination analysis.
Complete coding sequences obtained in this study were aligned with publicly available complete coding sequences of Lagovirus europaeus representing genotypes GI.1, GI.2, GI.3 and GI.4 in BioEdit software version 7.0.3 67 . The final dataset included 221 sequences, 7,369 nucleotides in length (see supplementary information for the list of the sequences used). The dataset was screened for recombination by seven methods available in the RDP software version 4.4 (RDP, GENECONV, BootScan, MaxChi, Chimaera, SiScan and 3Seq) 32 with the following parameters: sequences were set to linear, Bonferroni correction, highest acceptable p-value of 0.05 and 500 permutations. Only recombination events detected by three or more methods were considered.
Recombinant strains were visualized by plotting a tanglegram using the "dendextend package" 68 in the RStudio software (version 3.6.1) 69 . Phylogenetic analysis. Pairwise nucleotide distance comparison (p-distance model) was computed using MEGA6 70 .
Following the results obtained with RDP, phylogenetic analyses were carried out separately for the following genome partitions: nucleotide positions (i) 1-429 (p16; supplementary information), (ii) 430-5,295 (p23, p37, p29, VPg, protease and RdRp), and (iii) 5,296-7,369 (VP60 and VP10). Maximum-likelihood phylogenetic trees were inferred in MEGA6 70 using the best model of nucleotide substitution determined in the same software for the different genome partitions, according to the lowest AICc value (Akaike information criterion, corrected). Support for each cluster was obtained from 1,000 bootstrap replicates.