Integrative biology defines novel biomarkers of resistance to strongylid infection in horses

The widespread failure of anthelmintic drugs against nematodes of veterinary interest requires novel control strategies. Selective treatment of the most susceptible individuals could reduce drug selection pressure but requires appropriate biomarkers of the intrinsic susceptibility potential. To date, this has been missing in livestock species. Here, we selected Welsh ponies with divergent intrinsic susceptibility (measured by their egg excretion levels) to cyathostomin infection and found that their divergence was sustained across a 10-year time window. Using this unique set of individuals, we monitored variations in their blood cell populations, plasma metabolites and faecal microbiota over a grazing season to isolate core differences between their respective responses under worm-free or natural infection conditions. Our analyses identified the concomitant rise in plasma phenylalanine level and faecal Prevotella abundance and the reduction in circulating monocyte counts as biomarkers of the need for drug treatment (egg excretion above 200 eggs/g). This biological signal was replicated in other independent populations. We also unravelled an immunometabolic network encompassing plasma beta-hydroxybutyrate level, short-chain fatty acid producing bacteria and circulating neutrophils that forms the discriminant baseline between susceptible and resistant individuals. Altogether our observations open new perspectives on the susceptibility of equids to strongylid infection and leave scope for both new biomarkers of infection and nutritional intervention.

Infection by gastro-intestinal nematodes is a major burden for human development worldwide as they both affect human health 1 and impede livestock production 2 . Worldwide reports of anthelmintic drug failures against nematodes of veterinary importance have accumulated 3 , threatening the sustainability of livestock farming in some areas. The same pattern applies in horses whereby widespread benzimidazole and pyrantel failures against cyathostomin populations have been reported [4][5][6][7][8] . In addition, reduced efficacy of macrocyclic lactones has been observed in the USA 6 , Brazil and western Europe 9 .
These small strongyles are mostly responsible for growth retardation in young animals, diarrhoea being frequently reported among clinical symptoms, along with colic 10,11 . Hosts become infected through ingestion of infective third stage larvae on pasture. These L3 larvae then migrate to their host caecum where they enter the mucosa and submucosa to complete molting into the fourth larval stage (L4), after two distinct stages (early and late L3) 12 . Duration of this process can vary in time, up to two years, as a result of an optional hypobiosis at the early L3 stage 13 . Following molting, they will resume their development to the adult reproducing stages in the lumen of the caecum and colon. The massive emergence of developing larval stages from the caeco-colic mucosa can cause a larval cyathostominosis syndrome 14 . This seasonal syndrome is characterized by the sudden onset of chronic diarrhoea associated with oedema and weight loss 14 , and remains a leading cause of parasitemediated death 15 .
Factors contributing most to the selection of drug-resistant cyathostomin populations in equids remain uncertain 6,7 . However, significant and heritable between-horse variation in strongylid egg excretion has been reported in both domestic 16,17 and wild horse populations 18 . These observations hence suggest that some horses are able to limit parasite egg production, the mechanisms of which remain to be uncovered but likely apply to two broad categories, i.e. reduction of overall worm burden or reduction of worm fertility as found for ruminant www.nature.com/scientificreports/ trichostrongylids 19 . While Faecal Egg Count (FEC) is a poor correlate of actual worm burden in horses 20 , it can serve as a marker trait for horse resistance quantification as reported elsewhere 21 . As such, horse resistance to strongylid infection is defined as the ability to control strongyle life-cycle. This expands the strict concept of host resistance-limiting parasite worm burden 22 -by accounting for epidemiological consequences of reduced pasture contamination with free-living stages. It is also the only way resistance could be tracked as measuring strongyle worm burden is hampered by ethical and material limitations associated with the culling of large numbers of horses otherwise. Nonetheless, heritable variation in FEC leaves scope for restricting drug application to the most susceptible horses, thereby alleviating the selection pressure on parasite populations. To date, the genetic architecture of this trait has not been defined in equids, although indications from ruminant species would be in favour of a polygenic architecture [23][24][25] defining a stronger type 2 cytokinic polarization in resistant individuals 26,27 . Identifying biomarkers of this intrinsic resistance would both contribute to understanding the host-parasite relationship and to defining relevant biomarkers for use in the field. Current targeted-selective treatment schemes are based on FEC that has suboptimal sensitivity and remains time-consuming despite recent advances that should ease egg detection 28 . As a result, its uptake in the field varies widely across countries and remains limited 7,29 despite being cost-effective 30,31 . To date, limited alternative biomarkers have been identified. Alteration in serum albumin level and decrease in circulating fructosamine were the main features found in cyathostomin infected ponies 32 . Independent observations concluded that mixed strongyle infection was associated with mild inflammatory perturbations 33 . We previously highlighted that susceptible ponies had lower monocyte but higher lymphocyte counts than resistant individuals upon natural strongylid infection 34 . More susceptible individuals also exhibited differential modulation of their faecal microbiota, including enrichment for the Ruminococcus genus 34 , corroborating independent observations of alterations in the gut microbiota composition of infected horses [34][35][36][37] .
In horses as in other host-parasite systems, limited efforts have been made to isolate compositional shifts in plasma metabolites following parasite nematode infection. Beyond murine models of helminth infection 38,39 , implementation of this technology could define a urinary biomarker of infection by Onchocerca volvulus in humans 40 . This was however not reproduced in other cohorts of patients 41 . In livestock species, a single study has applied metabolomic profiling on horse faecal matter to identify biomarkers of infection by parasitic nematodes but found little differences between horses with contrasting levels of strongylid infection 35 .
In any case, these observations remain limited to individual host compartments and do not provide an integrated perspective of the physiological underpinnings associated with susceptibility to infection. Because we are aiming to distinguish between individuals before the onset of clinical signs, the biological signals may be subtle. To this respect, integration of systems biology data-that consider multiple high-dimensional measures from various host compartments-is expected to better identify the multiple features defining a given physiological state 42 .
Under this assumption, we combined and analysed metabolomic, metagenomic and clinical data collected on a selected set of intrinsically resistant and susceptible ponies to identify the physiological components underpinning their resistance potential. Our data defined a strongylid infection signature built around lower circulating monocytes, enriched plasma phenylalanine concentration and higher Prevotella load in faecal microbiota that we could replicate in independent cohorts. We also identified an immunometabolic signature centered on neutrophils that best discriminated between resistant and susceptible individuals across strongyle-free or natural infection conditions. These results begin to define the physiological bases supporting the intrinsic resistance potential to cyathostomin infection in equids.

Results
This experiment was based on a set of individual Welsh ponies with divergent resistance to strongyle infection, measured as their ability to limit eggs excretion. During the experiment (2015), we produced metabolomic data that we present herein and integrated this plasma-related dataset with previously described faecal bacteria profiles and clinical parameters 34 . We analysed the data produced for each group under worm-free conditions (day 0) or following natural infection (day 132) to (1) identify biomarkers of infection, (2) establish a holistic view of the physiological underpinnings of the resistance potential to strongyle infection in horses.

Pony divergence toward strongyle infection is significant and sustained. We selected 20 female
Welsh ponies with divergent susceptibility to strongylid infection. Their susceptibility potential was predicted from their past FEC history (at least three FEC records over two years between 2010 and 2015). During the experiment (2015) and following natural infection (day 132), 17 ponies displayed FEC values in good agreement with their predicted potential. In that case, 8 of the susceptible ponies (TS) had FEC above the considered 200 eggs/g cut-off for treatment (average FEC = 419 ± 149 eggs/g) at day 132, and 9 resistant ponies (TR) were below this threshold (average FEC = 56 ± 77 eggs/g). Our prediction hence achieved an accuracy of 85%. Other ponies that did not match expectations had either higher susceptibility (850 eggs/g for the predicted resistant individual) or too low FEC (0 and 50 eggs/g for the two susceptible individuals).
In agreement with this observation, FEC measured in the TS and TR ponies during the five years preceding the experiment departed significantly from the herd mean (0.6 standard deviations ; P = 0.01 and P = 0.02 for TS and TR groups respectively; Fig. 1). To ensure that their intrinsic potential was true, we compared their FEC records collected after the experiment took place (between 2015 and 2020) to that of their herd (1436 individual records). We confirmed that their divergence was sustained (− 0.76 and + 0.63 standard deviation from mean in TS and TR ponies respectively; P = 0.02 in both cases) throughout the following years ( Fig. 1), hence validating their intrinsic potential. This corresponded to an average FEC of 43 eggs/g (min = 0; max = 2200 eggs/g) Metabolomic profiling highlights the association between plasma phenylalanine level and FEC. To identify markers of pony intrinsic resistance potential, we measured variation in their plasma metabolites using 1 H-NMR between worm-free conditions (day 0) and after natural infection (day 132). This metabolomic profiling of TR and TS ponies throughout the grazing seasons found a total of 791 metabolic buckets, corresponding to 119 unique metabolite signals (Fig. 2, Supplementary Table 1). These included several amino acids, energy metabolism-related metabolites, saccharides, unknown lipids, and organic osmolytes in the plasma. Among these signals, we observed 29 unassigned bins in three main windows ranging between 1.115 and 1.435 ppm, 3.385 and 4.305 ppm or 6.805 and 7.895 ppm (Supplementary Table 1).
Despite the lack of systematic modifications of metabolomes between TR and TS ponies, we sought to identify individual metabolites that would reflect the intrinsic susceptibility status of ponies before any infection took place (day 0) or metabolites that would differentiate individuals in need of treatment at the end of the grazing  www.nature.com/scientificreports/ season (day 132). Considering the nominal P values of 5%, we identified differences in dimethyl-sulfone and lysine associated signals that were all decreased in the TS pony group ( Supplementary Fig. 2a). None of these signals were however significantly correlated with final FEC measured at day 132 (Spearman's ρ ranging between − 0.08 and 0.34, Supplementary Fig. 2b). The same approach applied to TR and TS ponies after natural infection at day 132 found lower levels of phenylalanine ( 1 H-NMR signals at 3.285-3.305 ppm and 7.345-7.445 ppm) in TR compared to TS ponies (nominal P-value = 6 × 10 −3 ). In addition, an unidentified metabolite between 6.815 and 6.815 ppm (U13) was significantly lower in TR (P = 0.04). Phenylalanine signal intensities increased with FEC (Spearman's ρ ranging between 0.55 and 0.60, P < 0.05, n = 17). Using the faecal metabolomic data from another independent set of British horses 35 , we could validate this signal. In that study and in line with our results, faecal phenylalanine level was significantly increased in horses with higher FEC (Wilcoxon's test = 27, P value = 0.02, Fig. 3C).
Overall these results indicate that strongylid infection experienced by TR and TS ponies did not induce metabolome-wide modifications. However, phenylalanine was the most discriminant between TR and TS ponies under infection and stands as a biomarker of the need for treatment.

Multi-compartment data integration identifies circulating monocytes, phenylalanine and faecal Prevotella levels as discriminant features under infection.
To mine the physiological differences between infected TR and TS ponies more deeply, we applied a data integration framework (sGCC-DA) bringing together clinical, metabolomic and previously analyzed faecal microbiota data from these individuals 34   Additional features included higher monocyte counts (1.178 ± 0.23 and 1.038 ± 0.24 million cells/mm 3 in TR and TS ponies on average) and higher levels of plasma alkaline phosphatase (ALP; 5.39 ± 0.15 and 5.25 ± 0.12 Units/L) in TR ponies (Fig. 4). However, these parameters displayed least covariation with other parameters (Fig. 4).
Analysis for KEGG pathway enrichment of metabolite signals found significant over-representation of Aminoacyl-tRNA biosynthesis (FDR = 1.1 × 10 −4 ) underpinned by the presence of alanine, lysine, phenylalanine, serine, and tryptophan (Supplementary Table 2). Of note, variations in alanine and phenylalanine plasma levels were similar to what was previously reported for dengue fever 43 , another-yet viral-infectious process (Supplementary Table 2). The most significant enrichment was defined by the Blautia, Coprococcus and Ruminococcus association found in forms of pediatric Crohn's disease 44  To validate the association of the most discriminant blood and bacterial features with FEC, we used either a previously published horse data set 35   www.nature.com/scientificreports/ counts were also negatively correlated with FEC levels in the independent set of ponies (Spearman's ρ = − 0.31, P value = 0.14; Fig. 5).
These findings hence retain increased Prevotella abundance in faecal material as a conserved signal in equids with reduced FEC, and monocyte counts appears to be a good predictor of FEC level.
Short-chain fatty acid producing bacteria, plasma lysine and circulating neutrophils recapitulate pony intrinsic potential across conditions. Using the same analytical framework, we aimed to identify features that would best define the intrinsic pony resistance potential across worm-free or natural infection conditions. www.nature.com/scientificreports/ The sGCC-DA applied on records measured under worm-free conditions (day 0) identified reduced circulating neutrophil and leukocyte counts in the TR ponies as the most discriminant clinical parameters (Supplementary Figs. 5, 6, and 7). Cell counts of these two populations were tightly linked with plasma levels of 1-methylhistidine (7.785 ppm; reduced in TR ponies), lysine (1.445-1.465 and 1.845-1.915 ppm; increased in TR ponies) and β-hydroxybutyrate (4.125-4.135 ppm; increased in TR ponies). In addition to these parameters, maximal covariance was obtained for a few genera from the Actinomycetia class (order Corynebacteriales), namely Dietzia, Gordonia, Mycobacterium and Sacharopolyspora (order Pseudonocardiales), that all showed higher relative abundances in resistant ponies, as well as the candidate genus BF311 within Bacteroidetes (i.e. BF311; Supplementary Fig. 5). On the opposite, higher relative abundance of butyrate-producing Clostridia, specifically Anaerofustis, Coprococcus and Ruminococcus were found in TS ponies ( Supplementary Fig. 5). These genera showed positive correlations with circulating lymphocytes and neutrophil counts (ranging between 0.  Table 2).
Combining these differential features between pony groups under worm-free conditions with that found after strongylid infection retained a core set of seven markers consistently discriminating TR and TS ponies across infection conditions (Supplementary Fig. 8). For these markers, we aimed to identify differential trends between both groups across infection conditions.  Table 3). This trend was also not corroborated in another independent set of individuals (Fig. 5).
Second, plasma lysine level was significantly different between infected TR and TS ponies (P = 0.002) and matched independent observations made in faecal samples from another cohort of British horses (Fig. 6). Our temporal records also supported a differential decrease of plasma lysine levels between TR and TS ponies from worm-free to natural infection conditions (P = 0.02).
Lysine and Fibrobacter would hence mark the intrinsically higher sensitivity of TS ponies.

Discussion
Inter-individual variation in resistance to strongylid infection offers the opportunity to reduce drug selection pressure by selectively treating the most susceptible individuals. To date, the physiological markers associated with the host resistance are to be defined. Here, we selected resistant and susceptible ponies based on their FEC and confirmed that their divergence in egg excretion was sustained over a 10-year time window. Using this unique set of individuals, we performed the first detailed investigations of the variations occurring in three different compartments (blood, plasma and faeces) to identify the underpinnings of their differential resistance to strongylid infection. Our findings are twofold. First, we found support for defining phenylalanine, circulating monocytes and faecal abundance of Prevotella as biomarkers of patent strongylid infection in equids. Second, we identified an immunometabolic network encompassing β-hydroxybutyrate level, short-chain fatty acid producing bacteria and circulating neutrophils that recapitulates the intrinsic resistance of ponies to patent infection under both worm-free and natural infection conditions. For practical purposes, resistance was defined as the ability to limit egg excretion in the host as precise worm burden quantification after pony culling was not possible. To remove as many encysted larvae as possible, we applied a moxidectin treatment four months before the start of the trial. This was sufficient to ensure negative FEC in all but one susceptible individual at day 0. This indicates that some encysted larval stages had resumed their life-cycle in the meantime, as a result of the imperfect efficacy of that drug against this particular life-stage stage (between 60 and 95% efficacy according to studies) 45 . While the exact contribution of this life-stage to the reported parameter variations remains unknown, their putative presence would not affect our conclusions. Indeed, if they contribute to observed FEC or if ability to limit larval burden and FEC are well correlated traits, the experimental design would account for their contribution. However, if variation in larval burden across individuals is independent from their ability to control FEC, reported parameter variations would be partially confounded by the presence of developing larval stages. Current knowledge cannot however disentangle this point.
Elevation of plasma phenylalanine concentration has long been recognized as a metabolic consequence of bacterial and viral infections 46 . Similar observation has also been made in the faecal matter of infected horses 35 and we could replicate it from plasma samples. Of note, our data highlighted the covariation of this essential amino acid with average daily weight gain and the concomitant increase in plasma alkaline phosphatase concentration. This is compatible with strongyle infection reducing muscle protein synthesis (in line with observed reduced average daily weight gain), thereby increasing the extracellular release of phenylalanine and its subsequent uptake by the liver (whose activity was tracked by the increased alkaline phosphatase level). This model would match the theoretical framework derived from other infectious processes 46 . Altogether, plasma phenylalanine represents another diagnostic option for strongylid infection. The sensitivity and specificity of this marker remains to be determined in a larger cohort for field use in association with monocyte count and faecal Prevotella abundance.
The concomitant increase in the plasma level of this aromatic amino acid with the monocyte count decrease upon infection in susceptible individuals also opens new perspectives on the pathophysiology associated with strongylid infection in equids. Indeed, observations in humans suggested that an antibacterial compound derived from phenylalanine and produced by the gut microbiota could alter monocyte cell migration 47 . Specifically, D-phenyllactic acid can favour monocyte recruitment through the binding of their ydroxy-carboxylic acid (HCA) 3 G-protein coupled receptor 47 . While the only lactate-associated HCA1 receptor exists in equids, the possibility remains that phenylalanine derivatives could recruit immune cells 48 . Under this speculative model, serum phenylalanine (increased in infected individuals), would enter the gut lumen before subsequent transformation into phenyllactic acid by gut bacteria. Bacteria-derived phenyllactic would then be absorbed into the portal vein and serve as a recruitment signal for circulating monocytes. Further studies on this receptor would confirm its role in any differential immunometabolic regulations between the resistant and susceptible pony groups.
Under patent infection, susceptible individuals exhibited decreased abundance of faecal Prevotella. Effects of the Prevotella genus are still debated 49 . Experimental colonization of germ-free mice by Prevotella promoted the decrease of IL-18 interleukin expression, neutrophil recruitment at the site of infection and gut inflammation 49 . Prevotella was also associated with susceptibility to Plasmodium infection in mice 50 , increased in patients infected by the human trematode S. haematobium 51 and in the colon of pigs infected by Trichuris suis 52 . A Prevotellaled inflammatory state would hence define the reduced strongyle infection observed in resistant ponies. This would contradict past findings in mice showing the detrimental association between IL-18 and Trichuris muris infection 53 . Because this cytokine seems to be sensitive to its environment, there is scope for microbiota-based regulation 54 that would differ between resistant and susceptible individuals.
In our attempt to discriminate between resistant and susceptible ponies, we identified a feature network with contrasting proinflammatory abilities. First, resistant ponies exhibited higher plasma lysine levels across conditions. Lysine is an essential amino acid for metabolism and immune response. Although the role of its derivative on the immune system is yet to be clarified 55 , it is a natural ligand of the G protein-coupled receptor family C group 6 (GPRC6) receptor 56 that can modulate Th-2 response and antibody production by B cells 57 . Second, butyrate-producing bacteria and other bacteria able to metabolize β-hydroxybutyrate into butyrate (i.e. Coprococcus) defined differential baselines between pony susceptibility groups under worm-free conditions. The former is known to favour an anti-inflammatory state by inhibiting the neutrophil inflammasome and the release of pro-inflammatory cytokines like IL-1β and IL-18 58,59 . Butyrate binds free fatty acid receptors, like FFAR2 that is enriched on neutrophil cell surface 48 , thereby promoting their intestinal recruitment 60 . Third, faecal Fibrobacter was a core discriminating feature of the intrinsic resistance potential. The drastic reduction of faecal Fibrobacter abundance in susceptible ponies upon strongylid infection mirrored past observations made in pigs infected with Trichuris suis, for which the reduction occurred irrespective of the worm load 61 . In goats, abundance of this genus was negatively correlated with the pro-inflammatory cytokines TNFɑ that is produced by monocytes 62  www.nature.com/scientificreports/ another short-chain fatty acid able to modulate the recruitment of monocytes and neutrophils 48 . The reduction of this bacterial genus could hence result in a pro-inflammatory state favourable to the development of strongyle infection in susceptible ponies. This immuno-metabolic network tying the host metabolites with its gut microbiota also points towards a key role of neutrophils in the definition of strongylid susceptibility. In line with this strand of evidence, we found a consistent covariation between circulating neutrophils and intrinsic susceptibility to strongylid infection. This relationship was however not validated in another independent population with higher excretion levels than that found during our 2015 experiment. The role played by neutrophils hence remains to be fully characterized as for other host-helminth models 63,64 . Transient neutrophilia (in the range of 9 × 10 9 cells per L) was previously described as the only haematological variation in ponies subjected to experimental infection by cyathostomins 32 . But neutrophils are not typically associated with type 2 immunity-that is effective against helminths 65 -and they do not play a role against the infection by Trichinella spiralis, a clade I nematode 66 . However, neutrophils can cooperate with macrophages to bring nematode infection under control as found for Litomosoides sigmodontis 67 , Strongyloides stercoralis 68 and Heligmosomoides polygyrus 69 . They also appear to be regulated by the type-2 cytokinic environment to prevent damages to the host 63 . In horses, neutrophils respond to stimulation by IL-4-a type 2 cytokine-by an increase in the pro-inflammatory cytokines TNF-ɑ and IL-8 but a decrease in IL1-β 70 that could be detrimental to the anti-helminth response. The covariation between neutrophil counts and the enhanced susceptibility in horses may thus result from qualitative differences. Application of single-cell RNAseq on neutrophil populations from the two pony backgrounds could provide significant advances in the understanding of their respective properties. Quantification of cytokines and reactive oxidative species production after in vitro exposure to infective strongylid larvae could also unravel distinct properties and putative bias toward a more effective response in resistant individuals.
Overall, this work suggests that FEC records from at least two years are sufficient to define the resistance potential of an equid to strongyle infection. It also proposes phenylalanine, monocyte counts and faecal Prevotella abundance as biomarkers of infection. Last, we identify a neutrophil-centered network tying together gut microbiota members and a few metabolites as the discriminant baseline between susceptible and resistant individuals. These results open novel perspectives for the understanding of strongylid susceptibility in equids and for nutritional modulation of these infections.

Materials and methods
This study relied on measurements gathered during a previous study 34  Selection and monitoring of 20 ponies with divergent susceptibility profile to strongyle infection. Ponies were selected from the experimental herd (INRAE UE PAO, Nouzilly, France) according to their FEC history recorded since 2010. They usually graze from April to November according to meteorological conditions and maintained indoors under worm-free conditions (240 m 2 pen with slatted floors) otherwise. Because worm burden could not be determined under our setting, we considered resistance as the ability to limit egg excretion and select ponies based on that criterion. Briefly, individual pony random effect was estimated from individual records after correction for environmental fixed effects including year, season, age at sampling, time since last treatment and last anthelmintic drug received. This was implemented only for ponies on which data was collected three times a year, over at least two years. Using these estimates, the ten most susceptible (S) and ten most resistant (R) ponies were retained for this study. Median FEC over the past 5 years were 800 and 0 eggs/g for S and R ponies respectively 34 . These two groups of female Welsh ponies were matched for age (mean group ages of 3.63 years ± 1.33 and 4.44 years ± 1.30 in the R and S groups respectively).
Ponies were housed in November 2014. While indoors they received hay ad libitum and 600 g concentrate per animal per day (Tellus Thivat Nutrition Animale Propriétaire, Saint Germain de Salles, France) as outlined previously 34 . This concentrate contained oats and oat bran, wheat straw, barley, alfalfa, sugar beet pulp, molasses, salt and calcium carbonate and was enriched with vitamins and minerals 34 . To remove any intercurrent nematode infection, they were administered moxidectin and praziquantel (Equest Pramox ® , Zoetis, Paris, France, 400 μg/kg of body weight of moxidectin and 2.5 mg/kg of praziquantel) on February 20th 2015 and kept indoors until they were turned out on pasture on June 20th 2015. Then, they remained on a 7.44 ha pasture from that date (day 0) to the end of October 2015 to observe the effects of natural infection in both groups (day 132). Pasture botanical analysis identified Festuca arundinacea, Phleum prateonse, Poa abbreviate, Holcus lanatus, and Dactylis glomerata.
At day 0 (4 months after moxidectin treatment), a single pony was FEC-positive (150 eggs/g). Although we did not explicitly determine egg reappearance period, this supports high efficacy level for moxidectin in this population in line with early observations after moxidectin release to the market 72 . It is however likely that some encysted larvae remained before the start of the pasture season. Nonetheless and because we aimed for a comparison between R and S groups at the end of pasture season, we decided to treat all ponies to reset their egg excretion. This was thought of as a conservative measure towards our hypothesis that would prevent unfair comparison between the S groups that harboured two FEC-positive individuals early in the season relative to the R group. Pyrantel pamoate (Strongid ® paste, Zoetis, Paris, France; single oral dose of 1.36 mg pyrantel base per kg of body weight) was chosen as it is effective (horses were all FEC negative 20 days after treatment on July 30th 2015) and short-lived, thereby leaving scope for reinfection and FEC measure in late summer 72 .
To identify biomarkers of infection, blood samples were taken at 0 and 132 days after the start of the trial to compare pony metabolomes before (indoors and under worm-free conditions) and after the pasture season  55358 88072 27249 8e12). Haematological parameters were determined after 15-min stirring at room temperature with a MS9-5 Haematology Counter ® (digital automatic haematology analyzer, Melet Schloesing Laboratories, France). Recorded parameters encompassed erythrocyte, micro-and macro-erythrocyte counts, associated mean corpuscular volume, haematocrit, and mean corpuscular haemoglobin concentration. In addition, circulating thrombocytes were counted as well as leukocytes including lymphocytes, monocytes, neutrophils, basophils and eosinophils. For further analysis, an albumin to globulin ratio (AGR) was also performed.
In addition, serum biochemical parameters were quantified using Select-6 V rings with the M-Scan II Biochemical analyser (Melet Schloesing Laboratories, France). These parameters included albumin, cholesterol, globulin, glucose, alkaline phosphatase (ALP), total proteins (TP) and urea concentrations.
Normality of haematological and biochemistry parameters was tested with the Shapiro-Wilk's test (shapiro. test() function in R) and variables showing test value below 0.90, i.e. FEC, glucose, ALP and AGR concentrations, and eosinophil count were log-transformed (Supplementary Table 4). Validation of monocyte and neutrophil counts were obtained from a group of 25 ponies in October 2020 using the same setting.

Proton nuclear magnetic resonance ( 1 H NMR) data acquisition and processing.
Blood was collected in heparin coated tubes. Whole blood drawn for plasma generation was refrigerated immediately at 4 °C to minimize the metabolic activity of cells and enzymes and kept the metabolite pattern almost stable. After clotting at 4 °C, plasma was separated from blood cells and subsequently cryopreserved at − 80 °C and shipped as a single batch for 1 H-NMR profiling. Plasma samples were thawed on ice, 200 µL were mixed with 500 µL of deuterium oxide (D 2 O) containing sodium trimethylsilylpropionate (TSP, 1 mM). Samples were vortexed, centrifuged (5500 g; 10 min; 4 °C) and 600 µL of supernatant were transferred into 5 mm NMR tubes.
The 1 H-NMR analysis was performed on a Bruker Avance III HD spectrometer (Bruker, Karlsruhe, Germany) operating at 600.13 MHz, and equipped with a 5 mm reversed 1 H-13 C-15 N-31 P cryoprobe connected to a cryoplatform. 1 H NMR spectra were acquired using a Carr-Purcell-Meiboom-Gill (CPMG) spin echo pulse sequence with a 2-s relaxation delay. The spectral width was set to 20 ppm and 128 scans were collected with 32 k points. Free induction decays were multiplied by an exponential window function (LB = 0.3 Hz) before Fourier Transform.
Spectra were manually phase and baseline corrected using Topspin 3.2 software (Bruker, Karlsruhe, Germany). All spectra were referenced to TSP (d 0 ppm). The spectral data were imported in the Amix software (version 3.9, Bruker, Rheinstetten, Germany) to perform data reduction in the region between 9.0 and 0.5 ppm with a bucket width of 0.01 ppm. The region between 5.1 and 4.5 ppm corresponding to water signal was excluded and data were normalized to the total intensity of the spectra. 1 H-NMR data were filtered from noise-related signals and finally consisted in 791 buckets ranging from 8.585 to 0.505 ppm (Supplementary Table 1). Buckets were annotated based on similarity of chemical shifts and coupling constants between plasma samples and reference compounds. The comparison was performed between one dimensional analytical data and reference compounds acquired under the same analytical conditions in our internal database, as well as from public databases like the Human Metabolome Database (HMDB, http:// hmdb. ca/) and the Biological Magnetic Resonance Data Bank (http:// www. bmrb. wisc. edu/). Manually curation was performed to isolate buckets corresponding to the same metabolite. Intensities of these metabolite-specific buckets were summed together to define metabolite-associated signals (hereafter referred to as "signals"), yielding 119 signals for further analysis.
To prevent spurious signals linked to age differences between individuals, metabolite levels at D0 were regressed upon pony age. None of the 119 metabolite signal intensities showed variation associated with pony ages at the considered cut-off (FDR < 5%).
16S faecal microbiota data for data integration. Total microorganism's DNA was extracted from aliquots of frozen fecal samples (200 mg), using E.Z.N.A. ® Stool DNA Kit (Omega Bio-Tek, Norcross, Georgia, USA). The V3-V4 16S rRNA gene amplification and sequencing have been described elsewhere 34 .
For data integration purposes (described in next paragraph), 16S rRNA gene sequencing data generated from these ponies were re-analysed using qiime2 v.2020.2 74 . Adapters and primers were removed from sequencing data using cutadapt v2.1 75 . Trimmed data was subsequently imported into qiime2, and denoised using the Divisive Amplicon Denoising Algorithm from dada2 76 . In this workflow, Operational Taxonomic Units (OTU, sequence cluster defined by their dissimilarity level) are replaced by so-called amplicon sequence variant (ASV) that matches observed genetic variation in microorganism 16S rRNA gene amplicons instead of relying on a clustering operation 77 . Reads were quality filtered (maximal expected error of 1 for both reads), chimera trimmed (following the default "consensus" option) and the reads trimmed (6 and 20 bp for forward and reverse reads www.nature.com/scientificreports/ respectively) to yield 3,054,206 reads. ASVs were subsequently filtered to retain that found in at least two individuals and supported by five reads, before alignment and phylogeny building with mafft and fasttree respectively. ASVs were subsequently assigned taxonomy using a naive Bayes classifier trained on the green genes reference database (gg_13_8) clustered at 99% similarity. This workflow left 6208 ASVs that were aggregated to 91 genera with phyloseq (v.1.32) for subsequent analysis. Total sum scaling normalization was applied to each taxon abundances for subsequent data integration analysis. The raw sequences of the gut metagenome 16S rRNA targeted locus are available in NCBI under the Sequence Read Archive (SRA), with the BioProject number PRJNA413884 and SRA accession numbers from SAMN07773451 to SAMN07773550.

Statistical analysis and data integration.
Our experiment focused on resistant and susceptible ponies monitored during a pasture season. Resistance in this context was defined as the ability to limit parasite egg excretion. Using this experimental design, we interrogated their respective metabolomes (1) to characterize metabolite trajectories throughout a single pasture season (June 2015 to Octobre 2015), (2) to identify biomarkers that would predict the pony intrinsic resistance to strongyle infection, (3) to identify biomarkers that could be used to differentiate between ponies reaching FEC cut-off for anthelmintic treatment (200 eggs/g) at the end of pasture season. We restricted this analysis to ponies whose predicted resistance status matched the observed FEC value at the end of pasture season, leaving nine true resistant (TR) and eight true susceptible (TS) ponies respectively. Statistical analyses applied for each of these three objectives are presented below.
Multivariate analysis of longitudinal metabolomic data in TR and TS ponies. First, we aimed to characterize metabolomic modifications occurring in TR and TS ponies throughout a grazing season. To analyse our longitudinal data, we implemented an ANOVA-Simultaneous Component Analysis (ASCA) 78,79 with the Metabo-Analyst R package v.3.0.3 80 . This analysis first partitions the variance contained in the metabolite data across the factors of interest (susceptibility group and time point including 0, 24 or 132 days after onset of the pasture season) and their interactions, thereby correcting the data for these effects. Simultaneously, a PCA is applied to each partition for dimensionality reduction ultimately isolating the metabolites contributing most to each effect.
Significance of each effect was tested by 1000 permutations. Following ASCA, 37 outlier metabolite signals were identified and subsequently removed from further data integration analysis leaving 319 signals for subsequent analysis.
Differential analysis of signal metabolite intensities between TR and TS ponies and between ponies in need of treatment. Second, we aimed to identify biomarkers that would either be predictive of the intrinsic resistance potential of an individual (TR vs. TS comparison at day 0) or would distinguish between individuals in need of treatment (TR vs. TS comparison at day 132). To fulfill these two aims, we respectively performed Student's t tests on the 319 retained metabolite signals between TR and TS pony baselines or between individuals showing FEC below or above 200 eggs/g at day 132. To account for multiple testing, nominal P-values were adjusted using the Benjamini-Hochberg correction (FDR) as implemented in the p.adjust function (stats package v.4.0.2). Spearman's correlations were estimated using the rcorr function from the Hmisc package v.4.4-1.
Data integration to identify biomarkers associated with parasite resistance or need of treatment. As a complementary approach for biomarker identification, clinical (including blood cell population profile, blood biochemistry and average daily weight gain), metabolomic (319 signals) and faecal microbiota data were integrated to identify correlated signals associated with intrinsic resistance potential or need of treatment. This approach also has the potential to uncover biological signals that would be missed when considering each dataset independently 42 . We ran two analyses to extract the features from each dataset with best discriminant ability between TR and TS ponies either before the pasture season on one hand, or between TR and TS ponies in need of treatment at day 132 on the other hand.
In this analysis, ASV counts estimated from faecal microbiota data were aggregated at the genus level within each time point of interest (day 0 or day 132) using the tax_glom function of the phyloseq package v.1.32.0. They were then filtered to retain those reaching 5% prevalence (n = 45 and 41 at day 0 and day 132 respectively), and normalized with the centered log-ratio transformation of the mixOmics package 81 . 1 H NMR data consisted in the 319 metabolite signal intensities retained following filtering and outlier identification with ASCA. Clinical data consisted of 18 parameters. At day 0, ADG was not considered as no variation occurred leaving 17 parameters. FEC was not considered as it was used to define the groups to be compared.
Data integration was performed following the DIABLO (data integration analysis for biomarker discovery using latent variable approaches for Omics studies) framework 82 as implemented in the mixOmics package v.6.12.2. This algorithm implements a sparse generalized canonical correlation discriminant analysis (sGCC-DA) 42,82 . Briefly, DIABLO performs feature selection, thereby retaining the only bacterial genera, metabolite signal or clinical parameters with best discriminative ability between groups. Using this sparse dataset, DIA-BLO then seeks for latent components (linear combinations of features from each dataset, i.e. 1 H-NMR, faecal microbiota, and clinical parameters) that simultaneously explains as much as possible of the covariance between input datasets and the status of interest, i.e. pony resistance potential before or under infection 42,82 . We applied this analysis to discriminate between TR and TS ponies either before the onset of pasture season (day 0) or at pasture turnout (day 132). In each case, the number of features to be retained was determined by cross-validation analysis (10 × fivefold) with the tune.block.splsda() function exploring grids of 10-20 genera with increments of 5, 10 to 80 metabolite signals with increments of 10 and 5 to 10 clinical parameters with increment of 1. The www.nature.com/scientificreports/ correlation matrix between each input datasets was determined after running a partial least square analysis using the pls() function 81 . Following data integration, we applied linear regression models to quantify group and time variation in the levels of seven features that distinguished between TR and TS ponies both at day 0 and day 132. These most discriminant features had either an absolute contribution of 0.25 on the first sGCC-DA axis. For metabolite signal intensities, no metabolite matched this condition and all features contributing to the 1st axis at day 0 and 132 were retained. Enrichment analysis. To isolate biological pathways associated with discriminant 1 H NMR signals and bacterial taxa, enrichment analyses were run using the MetaboAnalyst v5.0 80 and MicrobiomeAnalyst 80 web interfaces respectively. Significant metabolite signals were tested for enrichment against KEGG 83 annotated metabolites and disease related blood biomarkers, while enrichment analysis on discriminant bacterial genera were run using taxa collections associated with aging or disease. Any enrichment with a FDR below 5% was deemed significant.
Validation in an independent set of data. We validated our biological signal on microbial and metabolomic data using the data from Peachey et al. 35 as an independent data set because they combined both 16S rRNA gene amplicon sequencing with metabolomic analyses. While they performed 1 H-NMR on faecal material, we wanted to evaluate how our results obtained from blood 1 H-NMR could match theirs. Raw data were retrieved and processed as ours, but following Peachey et al. ' read truncation parameters 35  Blood parameters of interest were validated in an independent set of 24 ponies with high FEC (821 eggs/g on average, ranging between 250 and 1700 eggs/g). Increased alkaline phosphatase had already been described in infected horses 32 and was not considered for validation.

Data availability
R script and raw data matrices are available. All the phenotypic and H-NMR data have been uploaded under the github repository available under the https:// github. com/ guiSa lle/ STROM AEQ repository. The raw sequences of the gut metagenome 16S rRNA targeted locus are available in NCBI under the Sequence Read Archive (SRA), with the BioProject number PRJNA413884 and SRA accession numbers from SAMN07773451 to SAMN07773550.