Identification of candidate genes associated with bacterial and viral infections in wild boars hunted in Tuscany (Italy)

Wild boar (Sus scrofa L.) is one of the large mammals most spread worldwide, highly adaptable, and its population rapidly increased in many areas in Europe, including Italy, where Tuscany is considered particularly suitable for wild boar. Wild boars are potential hosts for different etiological agents, such as Brucella spp., Leptospira spp. and Pseudorabies virus and they can contribute to maintain and/or to disseminate some bacterial or viral pathogens to humans and domestic animals, above all-in free-range farms. In order to identify hypothetical genomic regions associated with these infection diseases, 96 samples of wild boars hunted in Tuscany during the 2018–2019 and 2019–2020 hunting seasons were considered. Diagnosis was achieved by serological tests and 42 Pseudorabies, 31 Leptospira and 15 Brucella positive animals were identified. All animals were genotyped with Geneseek Genomic Profiler Porcine HD (70 k) and a genome-wide scan was then performed. Significant markers were highlighted for Pseudorabies (two SNPs), Brucella (seven SNPs), and Leptospira (four SNPs) and they were located within, or nearby, 29 annotated genes on chromosome 6, 9, 12, 13, 14 and 18. Eight genes are implicated in viral (SEC14L1, JMJD6, SRSF2, TMPRSS2, MX1, MX2) or bacterial (COL8A1, SPIRE1) infections, seven genes (MFSD11, METTL23, CTTNBP2, BACE2, IMPA2, MPPE1 and GNAL) are involved in mental disorders and one gene (MGAT5B) is related to the Golgi complex. Results presented here provide interesting starting points for future research, validation studies and fine mapping of candidate genes involved in bacterial and viral infections in wild boar.

The wild boar (Sus scrofa L.) is widely distributed throughout Eurasia from Europe to the Far East, including SouthEast Asia, and extending to North Africa 1 ; it is considered the second most abundant ungulate species in Europe 2 .
In Italy, the wild boar population is widely diffused 3 , reaching high-density levels in specific areas 4 and Tuscany is particularly suited to the wild boar. This is evidenced by the high number of animals hunted in this area [3][4][5] .
The high density of wild boar in a particular area is a serious problem for the agricultural economy, causing extensive damage to croplands 6 and may represents a severe hazard for both animals and human health.
In fact, it is known that wild boar can be the host for different etiological agents, thus contributing to maintaining and/or disseminating important zoonotic diseases 7 , as well as leptospirosis and brucellosis. Furthermore, other not-zoonotic diseases, such as the Pseudorabies (PrV) or Aujeszky's disease, also spread by wild boar have a large economic impact on the swine industry.
Brucella is a zoonotic Gram-negative bacterium. Among the different species, B. abortus, B. suis, and rarely, B. melitensis can infect swine as well as wild boar (Sus scrofa) [8][9][10] . Tuscany, as many other Italian regions, is free from bovine and ovine brucellosis from several years, thanks to the progress of the eradication plan implemented throughout the country since 1992 and 1994 (D.M. 2/7/92 n. 453; D.M. 27/8/94 n. 651; EFSA 2017). B. suis biovars 1 and 3 are rarely reported in Europe, while B. suis biovar 2 (bv. 2) is largely diffused in East Europe, and it was recently introduced in Italy where it was isolated from domestic pigs and wild boar 8,10,11 . Wild boar represents one of the main reservoirs of B. suis bv. 2, which is responsible for reproductive disorders such as infertility, abortion, stillbirths, decreased litter size, weak piglets, orchitis and epididymitis in males, and focal abscess formation 10 . Table 1 reports the prevalence of positive samples over the total of 96 for Pseudorabies, Brucella and Leptospira. Positivity to two etiological agents was detected only in 5/96 samples (5.21%; 95% CI 2.00-8.42%). Eight were the negative samples. To calculate the confidence interval (CI) and to assess the prevalence, a binomial logistic regression was performed.

Results
Genome-wide scan was performed, and 13 significant SNPs were identified, as shown in Table 2. Two SNPs were significantly associated to Pseudorabies (Fig. 1). ASGA0084173 is situated on chromosome 12, with SEC14L1 and MGAT5B genes upstream located and JMJD6, MXRA7, MFSD11, METTL23, SRSF2 downstream located; WU_10.2_18_30218795 is on chromosome 18, within CTTNBP2 gene and upstream to CFTR gene. www.nature.com/scientificreports/ Several SNPs were identified analysing Brucella infection (Fig. 2), precisely, seven SNPs are found significantly associated with this disease.
Both MARC0040908 and MARC0029225 markers are on chromosome 9 within NTM gene. 250 Kbp upstream and 250 Kbp downstream of the aforementioned SNPs no genes are found ( Table 2).
The other five significant SNPs are located on chromosome 13: two of them, namely, MARC0024545 and ALGA0072626, are situated in a close region, sharing CMSS1 and FILIP1L genes, which are found upstream to the two SNPs. MARC0024545 is an intron variant because within COL8A1, while ALGA0072626 is upstream to the aforementioned gene. ALGA0073505 and WU_10.2_13_200912860 markers are in a genomic region where no characterized genes are present. ASGA0060211 is classified from VeP database as intron variant because it is positioned within TMPRSS2 gene. BACE2, FAM3B, MX1 and MX2 genes are upstream and RIPK4 is downstream to the SNPs. Figure 3 described the four significant associated SNPs for Leptospira. The H3GA005311 is located on chromosome 6, in a genomic window rich of genes: SPIRE1, PRELID3A, AFG3L2 and TUBB6 are upstream and CIDEA, IMPA2, MPPE1, CHMP1B, GNAL and TRNAG-UCC are Table 2. List of significant SNPs identified for each disease. The SNP name, the chromosome and pb positions, p-value, type of variant and the list of the candidate genes in the regions flanking the significant markers (± 250,000 nucleotides) are reported. STRING software was used to analyse the possible interactions among proteins encoded by identified genes. Figure 4 showed the results for each infection.
Three of the nine genes (JMJD6, MGAT5B and METTL23, Fig. 4a) for Pseudorabies infection are linked. Brucella phenotype showed the greater interactions among genes: two clusters are identified, the first between FILIP1L and CMSS1, and the second one among MX1, MX2, RIPK4, TMPRSS2, BACE2 and FAM3B genes (Fig. 4b). Although Leptospira was the infection with more genes found in the significant genomic window, few genes are linked to each other (IMPA2, MPPE1, and GNAL-TAF5 and USMG5, Fig. 4c).
Using the PANTHER software, it was possible to summarize the biological processes and molecular functions in which the identified candidate genes are involved (Table 3). For all the infections, the functional genes were enriched in "GO: 0065007, biological regulation", "GO: 0009987, cellular process", "GO: 0051179, localization" and "GO: 0008152, metabolic process". Furthermore, the functional genes of Leptospira were enriched in many other processes including "GO: 0000003, reproduction" and "GO: 0022414, reproductive process". More molecular functions were detected for genes involved in Leptospira infection than for Pseudorabies and Brucella infections.

Discussion
The seroprevalence estimate obtained for the three infections were higher than those observed in previous studies. In particular, the presence of PrV (48.9%) in our population is higher than what reported in the same area  The results of the GWAS suggest that some genes might have a role in the resistance/susceptibility to the studied bacterial and viral diseases in wild boar, but no gene was found common to the three infections.
The genomic window 250 Kbp downstream and upstream to the significant SNPs contained a variable number of genes, ranged from 9 (Pseudorabies) to 20 (Leptospira), as reported in Table 2. Interesting is that several SNPs are close to each other, suggesting being in linkage disequilibrium and probably fixed. Two pairs of SNPs were found very close in Brucella infection, ALGA0073505-WU_10.2_13_200912860 (~ 182 Kbp of distance) and MARC0024545-ALGA0072626 (~ 26 Kbp of distance). The closest SNPs are H3GA0042130 and ASGA0066225, identified in Leptospira analysis, distant less than 15 Kbp.
Seven genes associated with Pseudorabies and involved in multiple biological processes are located on chromosome 12. For instance, the MGAT5B gene (alpha-1,6-mannosylglycoprotein 6-beta-N-acetylglucosaminyltransferase B) is related to the Golgi apparatus operating at the intersection of the secretory, lysosomal, and endocytic pathways. Several studies 49,50 indicated that viral PrV envelopment and tegumentation occurs at late Golgi or post-Golgi compartments, suggesting that this gene may have a role in PrV virulence and dissemination. As reported in Fig. 4a, this gene is linked to other two genes of interest, the JMJD6 (JmjC domain-containing protein) and the METTL23 (methyltransferase like 23). The JMJD6 gene has many functions, ranging from a cell surface receptor for recognition of apoptotic cells to a nuclear factor responsible for lysine hydroxylation and arginine demethylation 51 . Recent reports indicating a multifactorial role in foot-and-mouth disease virus (FMDV) infection 52 , in tumorigenesis and virological interactions 53 . METTL23 belongs to a family of methyltransferase like proteins (METTL) that transfer methyl group to various substrates and it is involved in human intellectual disability 54 . The same functionality has been attributed to the MFSD11 (major facilitator superfamily domain containing 11) gene 55 .
On the same chromosome SEC14L1 gene was interestingly detected (SEC14 like lipid binding, also called PRELID4A) and it interacts with RIG-I, a cytosolic pattern recognition receptor, which has found to be required for the activation of anti-PrV activity 56 .
MXRA7 (matrix remodelling associated 7) is highly expressed in murine and human ocular tissues and might play a role in pathological processes or diseases involving injury, neovascularization and wound healing 57 . Another study 58 reports evidence for a protective role in the mouse psoriatic epidermis. Moreover, MXRA7 gene might be involved in bone marrow mesenchymal stem cells (BMSCs) functions 59 .
SRSF2 (serine and arginine rich splicing factor 2; also known as SFRS2) gene plays vital roles in a number of biological and pathological processes and it is associated in humans with the progression of a variety of diseases, including viral infection and tumorigenesis 60 .
On chromosome 18, CFTR (Cystic fibrosis transmembrane conductance regulator) and CTTNBP2 (cortactin binding protein 2) genes were detected. It is interesting the role of CTTNBP2 in cattle and humans. In Brown Swiss cattle breed this gene is involved in a recessive neurological disease 61 , i.e. the Bovine Progressive Degenerative Myeloencephalopathy (Weaver Syndrome). It is considered as a good candidate gene in humans for a role in the pathogenesis of mental retardation 62 such as autism-like behaviours 63 .
Genes linked to nervous system are probably expected to be found, because neurological signs predominate with Pseudorabies disease progression, e.g. ataxia, circling, paresis and paralysis 64 .
Several genes associated with host resistance or susceptibility to Brucella spp. have been identified in cattle 65 , buffalo 66 , goats 67,68 , humans and pigs 69 , from those different type of Collagenases were found. In this research the Collagen type VIII Alpha 1 chain (COL8A1) is identified. COL28A1 has been previously associated with antibody response in feral swine (Sus scrofa) infected with Brucella suis 69 , suggesting that COL8A1 might play an analogue role in wild board.

Infection
Biological process GO terms Molecular function GO terms www.nature.com/scientificreports/ Furthermore, recent studies have shown that MX1 inhibits the replication of foot-and-mouth disease virus (FMDV) (as JMJD6 found associated with Pseudorabies), and bovine viral diarrhoea virus (BVDV) 77 . Finally, porcine MX2 was also found to have the antiviral activity against Porcine reproductive and respiratory syndrome virus (PRRSV) 78 .
As STRING software highlighted, the Mx genes are flanked by BACE2, TMPRSS2, FAM3B and RIPKA genes. The TMPRSS2 (transmembrane serine protease 2) gene encodes a serine protease that can process the influenza A virus hemagglutinin into its fusion-competent state in human airway epithelial cells 79 and it is mainly involved in SARS-CoV and SARS-CoV-2 infections 80 .
The second cluster that STRING identified for Brucella, was formed by CMSS1 and FILIP1L genes, but no studies describing these two genes were found.
NTM (Neurotrimin) gene encodes a member of the IgLON (LAMP, OBCAM, NTM) family of immunoglobulin (Ig) domain-containing glycosylphosphatidylinositol (GPI)-anchored cell adhesion molecules. A study performed in humans suggested that NTM gene is associated with the level of the intelligence quotient (IQ) and genome wide association studies identified an association between NTM variation and cognitive function performances in humans 81,82 .
Among the genes associated with Leptospira disease here highlighted, several of them are previously found to be associated with human neurological disorders: IMPA2, GNAL, MPPE1 and AFG3L2. IMPA2 (CIDE-N domain-containing protein) gene has been associated with bipolar disorder, schizophrenia 83 and febrile seizure 73 . GNAL (G protein subunit alpha L) gene has been linked to bipolar disorder and schizophrenia 84 . MPPE1 (Metallophosphoesterase 1) clustered with IMPA2 and GNAL genes through STRING software (Fig. 4c). Dysregulation of protein phosphorylation and subsequent abnormal cellular signalling has been postulated to be involved in neuropsychiatric disorders, thus making MPPE1 a plausible biological candidate gene for bipolar disorder (BPD) 85 . AFG3L2 (AFG3 Like Matrix AAA Peptidase Subunit 2) gene is a candidate gene for hereditary spastic paraplegias or neurodegenerative disorders (https:// www. genec ards. org/).
These results are interesting because human patients affected from Leptospira reported neurological manifestations in only 10-15% of cases 86 . The associations here identified suggest that nervous system could be more involved in wild boar Leptospira infections than in humans.
Another interesting gene is SPIRE1 (KIND domain-containing protein) which belongs to the SPIRE family that emerged as a class of host cell factors that may affect the invasion process. Interestingly, SPIRE1 has been implicated in the infection of Salmonella typhimurium 87 .
Other two genes of interest are the PRELID3A (PRELI/MSF1 domain-containing protein, also called SLMO1) and TUBB6 (Tubulin Beta 6 Class V). In particular, the PRELI-like family proteins acted as lipid transporters and play an important role of embryonic and development lymphocyte differentiation. The PRELI-like family proteins have been proposed to involve many cellular functions including apoptosis, cellular lipid metabolism and cellular signalling and were correlated with several types of diseases and malignant tumours 88 . TUBB6 gene was found associated with muscle differentiation and regeneration 89 .
In mouse, CIDEA (Cell Death Inducing DFFA Like Effector A) gene regulates thermogenesis, lipolysis, and conservation of energy and it is considered to be a proapoptotic factor 90 . The SH3PXD2A gene was studied in mouse and humans defining it as a potential risk gene for orofacial clefting, indeed, Cejudo-Martin et al. 91 argued that disruption of the mouse SH3PXD2A gene was associated with complete cleft of the secondary palate in 50-90% of mutant mice.
CALHM3 (Calcium Homeostasis Modulator 3) gene together with CALHM1, form a complex to mediate rapid taste neurotransmission; indeed, genetic deletion of CALHM3 abolished sweet, bitter, and human taste perception 92 . Also, NEURL1 (neuralized E3 ubiquitin protein ligase 1) gene has many functions, in particularly it was related to the cellular process involved in reproduction in multicellular organism 93 . Moreover, NEURL1 has been associated with fat content in Nordic cattle breeds 94 , while in humans this gene was related to survival in Oesophageal adenocarcinoma (EAC) patients 70 .
On the same chromosome PCGF6 (polycomb group ring finger 6) was found, which plays an essential role in embryonic development of mice and in mouse fertility 95 .
In summary, significant SNPs were detected to be associated with viral and bacterial disease. Furthermore, among the 29 genes highlighted, 18 genes could be considered candidate genes for genetic resistance or susceptibility to diseases. Indeed, identified genes are implicated in viral (SEC14L1, JMJD6, SRSF2, TMPRSS2, MX1, MX2) bacterial (COL8A1, SPIRE1), and neurological disorders (MFSD11, METTL23, CTTNBP2, BACE2, IMPA2, MPPE1 and GNAL), or in the functions of the Golgi complex (MGAT5B), organelle where viral envelope is occurred. No candidate genes related to reproduction system were identified for Leptospirosis and Brucellosis, but it could be hypothesized that wild boar responses are slightly different from those reported on reared pigs and only with a greater sample size it would be possible to individuate the association. In addition, the interesting findings of genes not directly related to infection symptoms, are intriguing, suggesting that further studies are needed to better clarify the pathways of these diseases. Results presented here represent interesting areas for future research, validation studies and fine mapping of candidate genes involved in bacterial and viral infections in wild boar.

Methods
Statement of animal rights. The wild boars were not hunted for the purpose of this study and none of the authors were involved with the hunting. Animals were hunted following regional hunting laws (Regolamento di attuazione della legge regionale 12  and 2019-2020 hunting seasons (from November to January) were sampled. The study was part of the project PRA_2018_56 financed by University of Pisa and entitled "Evaluation of hygienic-sanitary and qualitative parameters of wild boars hunted in Tuscany and Liguria" 14,29,41 , which had the purpose to investigate the role of the wild boar in the epidemiology of some infectious diseases for livestock and humans. Animals were hunted in different areas in the provinces of Pisa (34, from 5 different localities), Siena (20, from 2 localities), Grosseto (35, from 11 localities), and Livorno (7, from 5 localities), characterised by the abundant presence of wild boars and other wild ungulates 96 . At postmortem examination, samples did not present relevant lesions related to infectious disease. During necroscopy, the kidneys were collected, and serum was extracted from the infraorbital cavities 97 .
Serological analysis for Pseudorabies, Brucella and Leptospira infections. Serum samples were analysed by ID Screen Aujeszky gB competitive kit detecting anti-gB PrV antibodies (ID.vet, Grabels, France). Test procedures and interpretation of results were performed according to the manufacturer's instructions, adopting the short serum incubation protocol. The optical density was measured by a plate reader (Multiscan FC; Thermo Scientific, Waltham, MA, USA) at 450 nm wave-length.
Rose Bengal Test (RBT) and complement fixation test (CFT) were employed to detect anti-brucella antibodies. RBT and CFT were performed as described by World Organization for Animal Health (OIE) (OIE 2016); antigens used in both tests were obtained from the ''Istituto Zooprofilattico Sperimentale dell' Abruzzo e del Molise G. Caporale, Teramo". Leptospira antibodies were detected by microscopic agglutination test (MAT) as previously described 11  Total DNA extraction for each sample of kidney was performed, starting from about 100 µl of homogenised tissue, according to the salting out procedure proposed by Armani et al. 98 , further modified and applied as Salting out reference protocol by Armani et al. 99  The SNPs quality control (QC) has performed with PLINK v.1.07 (http:// zzz. bwh. harva rd. edu/ plink/) and only autosomal SNPs with a call rate higher than 95%, a minor allele frequency (MAF) > 1% and with no extreme deviation from Hardy-Weinberg equilibrium (P value > 0.00001) were included in the analysis. Animals with more than 5% of missing genotypes were discarded. After QC, 42,431 SNPs mapped on the 18 porcine autosomes and 93 individuals were retained. The number of SNPs per chromosome is reported in Table 4.
Genome wide association study and gene set enrichment analysis. The analysis was carried out for each infection separately, evaluating healthy vs infected animals. The association analysis was carried out with GenABEL 100 , which performs a simple linear regression marker-phenotype analysis. Firstly, the genomic relationship matrix was calculated with the function ibs (https:// rdrr. io/ cran/ GenAB EL/ man/ ibs. html), where for a given pair of individuals i and j, the identical by state coefficients (f i, j ) is calculated as follows: where N is the number of markers used, x i, k is the genotype of the ith individual at the kth SNP (coded as 0, ½ and 1), p k is the frequency of the reference allele and k = 1,…, N.
Then, the additive polygenic model described below was applied: Each phenotype has been analysed separately (affected: Pseudorabies 42; Leptospira 31; Brucella 15); β was a vector with the fixed sex effect and X was the incidence matrix that associated each observation to levels of factor in β. The random effects in the model were the animal and the residual, which were assumed normally distributed as α∼N 0, Gσ 2 g and e∼N 0, Iσ 2 e , where G was the genomic relationship matrix, I is an identity matrix, and σ 2 g and σ 2 e are the additive genomic and residual variances, respectively. Regression was performed using the GenABEL function mmscore and the associations between marker and phenotype with a P value ≤ 5 × 10 -5 were considered significant 101 . The aforementioned threshold was used considering that the study was carried out on wild boar, which has not a species-specific SNP chip. Moreover, this level of significance for association is called "suggestive", and it was introduced by Lander e Kruglyak 101 and it is widely used in GWAS [102][103][104][105] . For each trait, a Manhattan plot and a quantile-quantile (Q-Q) plot were produced using the R VeP database (https:// www. ensem bl. org/ info/ docs/ tools/ vep/ index. html) was used to investigate the type of significant SNPs. A genomic window of 250 Kbp upstream and downstream from the significant SNP for each trait was investigated using the R package biomaRt 106,107 , which accesses the data available in Ensembl database (https:// www. ensem bl. org). The genes identification was based on Scrofa 11.1 assembly (https:// www. ensem bl. org/ Sus_ scrofa/ Info/ Annot ation). For the Gene set enrichment analysis, the lists of protein coding genes were uploaded to STRING 11.5 108 and PANTHER v.16.0 109 .

Data availability
The datasets analyzed during the current study are available online on the following link: https:// osf. io/ bz89n/.  Table 4. Total number of SNPs before quality control (pre-QC), post quality control (post-QC) and the percentage of SNPs retained for each autosomal chromosome.

Chromosome
Total markers pre-QC Total markers post-QC %Markers post-QC