Spread of equine arteritis virus among Hucul horses with different EqCXCL16 genotypes and analysis of viral quasispecies from semen of selected stallions

Equine arteritis virus (EAV) is maintained in the horse populations through persistently infected stallions. The aims of the study were to monitor the spread of EAV among Polish Hucul horses, to analyse the variability of circulating EAVs both between- and within-horses, and to identify allelic variants of the serving stallions EqCXCL16 gene that had been previously shown to strongly correlate with long-term EAV persistence in stallions. Serum samples (n = 221) from 62 horses including 46 mares and 16 stallions were collected on routine basis between December 2010 and May 2013 and tested for EAV antibodies. In addition, semen from 11 stallions was tested for EAV RNA. A full genomic sequence of EAV from selected breeding stallions was determined using next generation sequencing. The proportion of seropositive mares among the tested population increased from 7% to 92% during the study period, while the proportion of seropositive stallions remained similar (64 to 71%). The EAV genomes from different stallions were 94.7% to 99.6% identical to each other. A number (41 to 310) of single nucleotide variants were identified within EAV sequences from infected stallions. Four stallions possessed EqCXCL16S genotype correlated with development of long-term carrier status, three of which were persistent shedders and the shedder status of the remaining one was undetermined. None of the remaining 12 stallions with EqCXCL16R genotype was identified as a persistent shedder.


Results
. Out of 17 mares introduced to the stud in 2012, 14 tested negative for EAV antibody at both 05/2012 and 12/2012 samplings, while the remaining three had viral neutralisation test (VNT) titres that ranged between 8 (two mares) and 32 (one mare) at 05/2012, and remained the same six months later at the 12/2012 sampling. A sharp increase in the number of EAV seropositive mares ((χ 2 = 48.7, p < 0.00001) was observed between 12/2012 and 05/2013 samplings. All 14 seronegative mares that were introduced to the stud in 2012 seroconverted to EAV by May 2013 with titres ranging from 32 to 128. A rising EAV titres were detected in the remaining three seropositive mares, with a four-fold or greater increase in the EAV titre observed for two of these mares. The proportion of EAV seropositive stallions remained similar within the same period ((χ 2 = 0.45, P = 0.45).
Out of 37 semen samples from 11 stallions that were tested for EAV RNA, 21 were positive ( Table 2). The positive samples were collected from six stallions, including three that tested positive more than once, and were hence classified as permanent shedders (hucPL2, hucPL3 and hucPL5).
Genomic analysis of EAV. Full    of EAV from different horses varied from 94.7% to 99.6%. Based on the phylogenetic analysis, EAV sequences from the current study were closely related to other Polish EAV sequences from the last two decades (Fig. 1). Single nucleotide variants (SNVs) were analysed in order to determine the intra-host variability of EAV. The total coverage for sequences from EAVhucPL4(05/2013) and EAVhucPL3(01/2009) was considerably lower than the total coverage obtained for viral sequences from the remaining samples, and below a value of 100 recommended for variant detection studies with next generation sequencing (NGS) 13 . Hence, the NGS data for EAV sequences from hucPL4 was not included in the SNV analysis. Multiple variable sites were identified throughout the viral genomes from each of the remaining three stallions, with the highest number (n = 310) detected in EAVhucPL2(05/2013) and the lowest (n = 41) in EAVhucPL2(01/2009) ( Table 3). Distribution of synonymous and non-synonymous SNVs is presented in Fig. 2. For all EAV sequences included in the analyses, the majority of non-synonymous SNVs were located in ORF1a fragment encoding non structural protein (nsp) 2 (Table 4) and in genes encoding viral surface glycoprotein (GP) 2, GP3, GP4 and GP5 (Table 5).
Two samples collected four years apart were available from stallion hucPL2. Based on the analysis of these two samples, the number of SNVs increased from 41 to 310 between 2009 and 2013. Some (n = 12) low-frequency SNVs present in 2009 became predominant in 2013, while others (n = 13) disappeared (Fig. 3).
EqCXCL16 genotypes of EAV-infected stallions. Among 16 stallions sampled in the study, 12 (75%) were homozygous for EqCXCL16R allele and hence had a genotype associated with resistance of CD3 + T lymphocytes to EAV infection and decreased likelihood of development of long-term carrier status, and four (hucPL2, hucPL3, hucPL4 and hucPL5) were heterozygotes (EqCXCL16R/EqCXCL16Sa) and hence were considered to have a susceptible phenotype correlated with an increased likelihood of establishment of long-term carrier status following EAV infection (Fig. 4).

Discussion
Equine arteritis virus, similarly to other RNA viruses, is characterized by a high genetic variability. This results in generation of closely related genetic variants, known as quasispecies, within an individual host 14 . Until recently, quasispecies were identified by Sanger sequencing of cloned PCR products amplified from either viral isolates or directly from semen of infected horses 15 . In the current study NGS was used for sequence analysis of EAV. Six full consensus sequences of EAV from four different stallions were obtained, with the sequencing depth that allowed for identification of genetic variants appearing with as low as 10% frequency for four of them. Although multiple variable sites were found in each of the analysed viral sequences, their total number differed between EAVs from the three stallions. The lowest number (n = 41) was found in the 2009 EAV sequence from hucPL2, followed by the 2013 sample from hucPL1 (n = 140). Both numbers represented less than half of the SNVs identified in the EAV sequences obtained from the two remaining samples, including a sample from stallion hucPL2 collected in 2013. It has been previously shown that EAV remains genetically stable after the onset of infection and during early persistence, but its variability increases over time in response to pressures from the host's immune system 16,17 . Our results support this conclusion, as hucPL3 and hucPL2 were persistently infected with on-going viral replication for at least six years prior to samplings in 2012/13, while hucPL1 was EAV negative until 2013.  In addition, there was a clear increase in the intra-host diversity between EAV sequences from stallion hucPL2 collected four years apart (Table 3, Fig. 3). The highest numbers of SNVs were identified in genes encoding nsp2, GP2, GP3, GP4 and GP5 of the virus. In general, this was consistent with the results of the previous studies using classical sequencing methods 12,15 . Similarly, in the most recent study that employed NGS, genes coding for GP3, GP5 and nsp2 showed the highest level of variability with the highest intra-host evolutionary rates recorded for ORF3 and ORF5, coding for GP3 and GP5 18 .
In agreement with the previous data 19 , non-synonymous SNVs in ORF5 were located predominantly in three variable regions (V) of GP5: V1 encompassing aminoacid (aa) positions 61-121, V2 (aa 141-178), and V3 (aa 202-222). The V1 has been previously shown to contain three out of four major neutralization sites (aa 49, 61, 67-90, and 99-106) 20 . Each of the analysed viruses showed non-synonymous SNVs in at least one of these four major neutralization sites within GP5 20 . Hence, our data support the previous assumption that variations in ORF5 is driven by the immune selection pressure 17 .
Non-synonymous SNVs were also found in genes coding for minor EAV glycoproteins (GP2, GP3 and GP4). It has been previously established that those proteins function in trimeric complexes. Hence, there are some structural constraints limiting their variability in regions responsible for binding 21 . This was consistent with our NGS results, as all non-synonymous SNVs in ORF2b (coding for GP2) were located in two previously described variable regions: V1 (aa 1-33) or V4 (aa 161-227) 22 . No SNVs were found at positions encoding cysteine (aa 48, 102, 137 and 195) in any of the viruses analysed, consistent with the importance of these residues for binding other minor glycoproteins 23 . All fixed amino acid changes within GP2 from a persistently infected stallion that was followed over a seven-year period in a previous study were also limited to the same two regions 22 .
Variability in GP3 was limited to two previously described highly variable sites (aa 3-41 and 98-121) 24 . It has been speculated that variants that had lost both glycosylation sites within the overlapping sequon ( 28 NNTT 31 ) through a change from asparagine to another amino acid at positions 28 and 29 may be selected during EAV persistence. This in turn could lead to the cleavage of the signal peptide in N-terminal end of GP3, although it www.nature.com/scientificreports www.nature.com/scientificreports/ is currently unknown what, if any, functional advantage would this create for the virus 25,26 . Our results do not support these speculations, as variants lacking one of the glycosylation sites (aa 29, Table 5) were found only in a recently infected stallion hucPL1, while asparagine was conserved at both positions in stallions hucPL2 and hucPL3 that have been persistently infected with EAV for years.
No variability was found in any of the glycoslation sites (aa 33, 55, 65 and 90) in predicted GP4 sequences. However, SNVs were found at nt positions 11 or 109 (corresponding to aa positions 4 and 37) in EAV sequences from two stallions. It has been proposed that these two SNVs may be involved in the attenuation of the vaccine strains of EAV 27 . Interestingly, substitution at aa position 37 of GP4 was found only in EAV from placenta of an infected pregnant mare, but not in EAV from any of the tested carrier stallions 28 . Therefore, the role of this SNV remains to be established, but it is unlikely that it is associated with carrier status of the host.
The only nsp where multiple SNVs were detected in the current study was nsp2. It has been previously shown that this protein induces humoral antibody responses in both naturally infected and vaccinated horses 29 . The majority of SNVs identified in EAVhucPL3(12/2012) and EAVhucPL2(05/2013) (two long-term carriers) were located in the part of ORF1a encoding a hypervariable region (aa 388-480) 12,30 . However, the pattern of SNVs in stallions that were recently infected with EAV at the time of sampling looked different. Most of SNVs in EAVhucPL1(05/2013) were located within the central region of nsp2, and only one SNV was present in nsp2 from EAVhucPL2(01/2009). Hence, it appears that SNVs accumulate within the hypervariable region of nsp2 during persistence, most likely due to selection driven by the immune response.
Next generations sequencing analysis performed in this study allowed for the determination of frequency and location of SNVs, and for the analysis of changes in the make-up of viral quasispecies within one persistently infected stallion over a period of time. Although this provided some insights into the changes in the dominant SNVs present at various time-points post-infection, a larger group of persistently infected stallions would need to be tested to confirm the observed tendencies and to determine whether or not some SNVs are linked to each other. It was not possible to reconstruct full-length sequences of individual viral haplotypes within the quasispecies population of EAV in each sample due to the use of short read sequencing. Application of novel NGS sequencing methods such as linked-read sequencing or long-read sequencing may allow to overcome this difficulty in the future 31 .
A number of environmental, viral and host factors may affect the outcome of EAV infection and establishment of persistent infection. These include factors such as breed of the horse, viral genotype or management 32,33 .  ORF1ab  225-9751  110  90  20  19  14  5  237  210  27  217  198  19   ORF1a  225-5399  67  51  16  13  9  4  137  116  21  120  107  13   ORF1b  5405-9751  43  39  4  6  5  1  100  94  6  97  91  6   nsp1  225-1004  9  9  0  4  2  2  24  19  5  12  12  0   nsp2  1005-2717  24  14  10  3  2  1  54  40  14  48  37  11   nsp3  2718-3416  5  5  0  3  3  0  15  14  1 Table 3. Total number and distribution of synonymous and non-synonymous single nucleotide variants in EAV genomes from three infected stallions including two confirmed persistent shedders (hucPL2 and hucPL3, see Table 2). a As compared to EAV Bucyrus strain (NC_002532 www.nature.com/scientificreports www.nature.com/scientificreports/ Stallions in the current study were all of the same breed, managed under similar conditions, and were exposed to closely related variants of EAV. Hence, these factors were unlikely to be responsible for the observed differences in the duration of viral shedding by infected stallions. Results of two recent studies suggested that stallions with specific alleles of EqCXCL16 gene, which are linked to in-vitro susceptibility of CD3 + T lymphocytes to EAV infection, were at increased risk of becoming persistent shedders of the virus 11,32 . The majority (12/16) of stallions in the current study were homozygous for the resistant (EqCXCL16R) genotype. Unfortunately, data from reverse transcriptase quantitative polymerase chain reaction (RT-qPCR) and serology were available for only some of the stallions at each of the six sampling occasions, which is a limitation of the study. Nonetheless, 3/16 stallions (hucPL2, hucPL3 and hucPL5) tested positive for EAV RNA in the semen for a period of at least two years and as such, could be classified as persistent shedders. All three stallions had an EqCXCL16 genotype associated with in vitro susceptibility to EAV infection and development of long-term carrier status. One stallion (hucPL11) with resistant genotype was positive for EAV RNA at only one sampling occasion and negative at all subsequent samplings, and hence has presumably cleared the infection. For six stallions the shedder status could not be determined as none, or only one, semen sample was tested by RT-qPCR for the presence of the virus. This group included five stallions with a resistant genotype and one (hucPL4) with a susceptible genotype. The remaining six stallions with a resistant genotype did not show any evidence of EAV infection (were serologically negative) at the time(s) they were tested. Altogether, out of the four EAV positive stallions that could be confidently classified as shedders or non-shedders, only one stallion with the resistant EqCXCL16 genotype cleared the virus. The remaining three that became persistent shedders all had susceptible EqCXCL16 genotypes. This is in agreement with the results reported previously where 29/34 (85.3%) stallions with a resistant genotype cleared the virus following EAV infection, whereas 32/43 (74.4%) stallions with the susceptible genotype became persistent shedders 11 .
Serological analysis performed in this study was the extension of the previous study 6 where EAV seropositivity among horses at the same Hucul stud was monitored between 2006 and 2008. In that study, the differences in EAV antibody status between various age groups were described, while in the current study we were interested in the changes in the serological status of horses over a period of time. For the first five samplings the percentage of EAV seropositive horses remained relatively low around 20-30%, compared with over 55% reported previously 6 . However, between 2012 and 2013 seroconversion was detected in the majority of the tested mares, which resulted in the increase of the EAV seropositive horses to over 90% at the last sampling. Based on the interview with the stud's staff, at least four of the EAV infected stallions, including three persistently infected ones (hucPL2, hucPL3 and hucPL5), were used for breeding in 2012 and 2013. It is not unusual to use persistently infected stallions for www.nature.com/scientificreports www.nature.com/scientificreports/ reproduction 34 providing that certain precautions are taken including ensuring that the stallions are bread only seropositive mares and such mares are separated for up to three weeks following cover 35 . Such precautions were not taken when a group of 17 EAV seronegative mares was introduced to the stud in 2012 and bred for the first time during 2012/13 breeding season. All these mares showed serological evidence of EAV infection between December 2012 and May 2013. It is likely that the initial spread of the virus was via venereal route, however subsequent spread probably also involved respiratory route and possibly fomites, as two previously seronegative stallions (hucPL1 and hucPL4) became EAV positive in 2013. The sharp increase in the proportion of EAV seropositive horses from 22% to 84% over a period of six months provides a good example of how quickly the virus can spread within any horse population if appropriate infection control precautions are not maintained.
In conclusion, the results of this study add to our understanding of EAV epidemiology and genomic variability. To our knowledge, this is the first study that involved horses of the primitive Hucul breed. We have shown that intra-host diversity of EAV sequences is concentrated at specific sites of the genome. The impact of these variations on the expression of disease and virus-host interaction needs to be elucidated in future research.

Methods
Horses and sampling. Hucul horses from a national horse stud located in the southern part of Poland were used for the study. Results of our previous study demonstrated the circulation of EAV within this herd 6 . Coagulated blood samples for serology were collected between December 2010 and May 2013 every six months from both mares and stallions. In total, 221 blood samples from 16 stallions and 46 mares were collected. These included samples from 17 mares that were introduced to the stud shortly before May 2012 sampling.
Semen was collected from seropositive stallions whenever possible and tested for the presence of viral RNA. Altogether, 35 semen samples from 11 stallions were collected. In addition, two archival semen samples from January 2009 from EAV positive horses hucPL2 and hucPL3 were included in the analyses. Eight stallions from the sampled group (hucPL1, hucPL2, hucPL3, hucPL5, hucPL8, hucPL10, hucPL11, and hucPL13) were used for breeding at least once during the period of the study.   Table 4. Non-synonymous single nucleotide variants in non-structural protein 2 (nsp2) of equine arteritis virus (EAV) from three infected stallions including two confirmed persistent shedders (hucPL2 and hucPL3, see Table 2). a Nucleotide position in ORF1a gene (GenBank accession number NC_002532. www.nature.com/scientificreports www.nature.com/scientificreports/ As the sampling and testing was done as part of veterinary management of the stud, no additional approval of the Local Ethics Committee was required for these activities (Directive 2010/63/EU). ORF2a (E)  145  49  A  -A  -A  -T  A  187/26  87.8   ORF2b  8  3  R  -R  -R  L  68/49  58.1  R  -(GP2)  20  7  L  S  30/16  65.3  S  -S  -L  -26  9  C  -C  -C  -C  Y  129/94  57.8   50  17  C  -C  -C  -C  Y Table 5. Non-synonymous single nucleotide variants in genes encoding structural proteins of equine arteritis virus (EAV) from three infected stallions including two confirmed persistent shedders (hucPL2 and hucPL3, see Table 2) www.nature.com/scientificreports www.nature.com/scientificreports/ Serology. Antibodies to EAV were detected using the VNT performed as described in the OIE Manual 36 .
Briefly, serial two-fold dilutions of heat-inactivated (30 min at 56 °C) serum samples were added to each well of the 96-well microtitre plate in duplicate and mixed with 25 μL (100 to 300 tissue culture infectious dose 50% (TCID 50 )) of the Bucyrus strain of EAV diluted in Eagle's minimum essential medium (MEM) (Sigma-Aldrich) containing guinea-pig complement at a final concentration of 10%. After incubation for 1 h at 37 °C, 50 μL (approximately 5 × 10 3 cells) of rabbit kidney 13 (RK13) cells (ATCC CCL-37) were added to each well. The plates were sealed with parafilm and incubated for three to five days at 37 °C in a humidified atmosphere with 5% CO 2 . Appropriate virus-, cell-and serum controls were included with each test run. The neutralisation titres were  Viral RNA extraction and EAV-specific RT-qPCR. Total RNA was extracted directly from seminal plasma using QIAamp Viral RNA Mini Kit (Qiagen). Extractions were performed according to the manufacturer's instruction with the exception that linear polyacrylamid (1 µL/sample) was used as carrier RNA. Reverse transcriptase qPCR was performed using Quantitect Viral Kit (Qiagen) and primers/probe complementary to a highly conserved region within the ORF7 gene of EAV as previously described 37 . Each reaction consisted of 1x QuantiTect Virus Master Mix, 1x QuantiTect Virus RT Mix, 0.8 µM of each primer, 75 nM of Taq Probe and 5 µL of template RNA in a total volume of 25 µL. The amplification conditions included the reverse transcriptase (RT) step (35 min at 48 °C), followed by the initial denaturation (10 min at 90 °C), and 40 cycles of denaturation (15 sec at 95 °C) and annealing/elongation (1 min at 60 °C). Samples were considered positive if quantification cycle (Cq) values were lower than 38. Extraction controls consisted of a Bucyrus strain of EAV (ATCC VR-796, positive control) and semen from an EAV seronegative stallion (negative control). Amplification controls consisted of RNA extracted from the Bucyrus strain (positive control) and water (non-template control). The RT-qPCR run was considered valid if all controls showed the expected results.
Next-generation sequencing. Six samples from four EAV infected stallions were subjected to NGS. These  6 whereas stallions hucPL1 and hucPL4 tested EAV positive for the first time in 2013. Prior to sequencing, all samples were concentrated by ultracentrifugation. Briefly, 6 mL of seminal plasma was diluted five times in phosphate buffered saline pH 7.3 to 7.5 (PBS) and centrifuged at 3,000 × g for 10 min. Supernatant was then layered onto a 30% glycerol cushion (5 mL) in ultraclear thinwall polypropylene ultracentrifuge tube (Beckman). Following addition of PBS to the final volume of 35 mL, each sample was centrifuged for 2 h at 236,000 × g at 4 °C in a 70Ti fixed angle rotor (Beckman). Supernatant was removed and the pellet was resuspended in 0.5 mL of PBS. Viral RNA for NGS was extracted from this preparation using the same procedure as described above for RT-qPCR. Next generation sequencing was performed at Genomed SA using Illumina MiSeq Personal Sequencer.
Analysis of NGS data. Quality of 2 × 250 base pairs (bp) Illumina reads was checked with FastQC (http:// www.bioinformatics.babraham.ac.uk/projects/fastqc). Overlapping reads were merged using bbmerge v36.62 38 and trimmed with Trimmomatic v0.36 39 to remove adapters and low quality reads. Next, reads mapping to ribosomal RNA database SILVA v. 128 40 were excluded (BWA MEM 41 0.7.12 with default parameters), and the remaining reads were assembled de-novo into contigs using SPAdes 3.9.0 42 . Contigs containing EAV genomic sequences were identified in each sample using an online version of the NCBI blastn program (v2.8.0) 43 , optimized for highly similar sequences (Megablast). The largest contig with an EAV reference sequence as the top hit was selected for futher analysis.
To identify SNVs within viral populations present in each sample, we used BWA MEM 41 with default parameters to map trimmed reads to the reference EAV sequences. A consensus viral sequence obtained from the relevant de-novo assembly was used as the reference sequence for stallions PL1 and PL4, while EAV sequences from 2009 samples were used as reference sequences for stallions PL2 and PL3. Variants were called using LoFreq. 2.1.2 with default parameters 44 . Frequency distribution of SNVs across each viral genome was visualized using R 3.4.1 45 . We included only variant alleles evidenced by at least five reads (alternative allele depth; as defined in 46 ) and present in at least 10% of the reads at a given loci (alternative allele frequency).
Phylogeny. Neighbour-joining phylogenetic tree was constructed based on the alignment of 2,895 nt long fragments of the viral genome encoding structural proteins (positions 9751-12645 in Bucyrus reference strain -NC_002532) using MEGA5.2 software 47 . Viral sequences obtained in this study, and selected EAV sequences previously detected in Poland (n = 4) and other countries (n = 15) were included in the analysis.

Analysis of EqCXCL16 gene variant of the stallions.
Total DNA was extracted from seminal plasma or serum samples (when semen was not available) collected from each of the stallions using QIAamp DNA Mini Kit (Qiagen) according to the manufacturer's instructions. PCR reactions were performed using JumpStart Taq DNA Polymerase (Sigma-Aldrich). Each reaction mix consisted of 0.4 µM of each primer (CXCL16-F and CXCL16-R 11 ), 0.4 µM of dNTP mix, 1 µL of JumpStart Taq DNA Polymerase, and 2 µL of DNA in 1× reaction buffer in a total volume of 25 µL. Specific PCR products (280 bp) were sequenced using the same primers. EqCXCL16 sequences were assigned to a genotype based on the presence of specific nucleotides at positions 715, 741, 744 and 750 following alignment with the reference sequence (XM_001504756) as described previously 11 . Two susceptibility alleles (T, C, A, A and T, C, G, A) were designated EqCXCL16Sa and EqCXCL16Sb, respectively. The recessive allele associated with resistance to in-vitro EAV infection of and development of long-term carrier status the establishment of persistent EAV infection (EqCXCL16R) was characterized by A, G, T, and G at the specified positions. All analyses were performed using MEGA5.2 software 47 .