Genome-wide epigenetic modifications in sports horses during training as an adaptation phenomenon

With his bicentennial breeding history based on athletic performance, the Thoroughbred horse can be considered the equine sport breed. Although genomic and transcriptomic tools and knowledge are at the state of the art in equine species, the epigenome and its modifications in response to environmental stimuli, such as training, are less studied. One of the major epigenetic modifications is cytosine methylation at 5′ of DNA molecules. This crucial biochemical modification directly mediates biological processes and, to some extent, determines the organisms' phenotypic plasticity. Exercise indeed affects the epigenomic state, both in humans and in horses. In this study, we highlight, with a genome-wide analysis of methylation, how the adaptation to training in the Thoroughbred can modify the methylation pattern throughout the genome. Twenty untrained horses, kept under the same environmental conditions and sprint training regimen, were recruited, collecting peripheral blood at the start of the training and after 30 and 90 days. Extracted leukocyte DNA was analyzed with the methylation content sensitive enzyme ddRAD (MCSeEd) technique for the first time applied to animal cells. Approximately one thousand differently methylated genomic regions (DMRs) and nearby genes were called, revealing that methylation changes can be found in a large part of the genome and, therefore, referable to the physiological adaptation to training. Functional analysis via GO enrichment was also performed. We observed significant differences in methylation patterns throughout the training stages: we hypothesize that the methylation profile of some genes can be affected early by training, while others require a more persistent stimulus.

The horse is an extraordinary athlete, shaped by evolution and human selection, with a marked aptitude for speed and endurance.Being prey, the horse needed to run at remarkable speed for considerable distances to escape predators.The horse's anatomy is, therefore, adapted to speed and stamina.Man has exacerbated these innate athletic skills through the selection that introduced dramatic phenotypic differences to Thoroughbred or Arab breeds 1,2 .For example, the race-oriented selection had a strong footprint on Thoroughbred, enhancing highperformance characteristics 3 such as Type IIA muscle fiber percentage up to 80-90% and boosting mitochondrial density with the intense activity of oxidative enzymes 4,5 .
Genetics is just one piece of the puzzle, especially in sports: the full potential is reached only after proper athletic preparation.
The Thoroughbred is an early, fast-growing breed, and training usually begins at 15 months of age.One tangible aim in race-oriented training is to enhance the horse's ability to reach the highest speed before reaching VO 2 max.In untrained Thoroughbreds, this limit is approximately 50 km/h, but it can be pushed up to 60 km/h after proper and intense training 6 .Training must be carefully planned since, while VO 2 max and aerobic capacity should be achieved early in training sessions, skeletal muscle improvement can only be observed after 4-6 months.This physiological gap often causes long bone and soft tissue problems, especially in young subjects 7 .

Results
The genomic DNA from the buffy coat of twenty (20) Thoroughbreds (gathered in bulks of 4) was used to assess the DNA methylation changes during the first training season, monitoring the horses for three months after a month of acclimatization.To this purpose, the MCSeEd technique was applied, sequencing the fragments generated by a coupled enzymatic digestion using methylation-sensitive and non-sensitive enzymes.

Sequencing, mapping, and DMPs identification
Single-end reads were produced from the fifteen (15) libraries, and after demultiplexing and cleaning procedures as previously described, a total of 131,296,618 sequences were used for the MCSeEd bioinformatic pipeline.On average, 8.7 million reads per sample were produced (details in Table 1).
Uniquely mapped reads on EquCab3 were in the proportion-on average between samples-of 83.2% of the input sequences.Virtually all of them were then reassigned to the experiment-wise annotation.After robust filtering and normalization, 94,957 loci were input to methylKit for downstream analysis.This number was further reduced considering only those covered by at least ten reads and statistically different throughout the time points (DMPs, differently methylated positions) with a false discovery rate (FDR) < 0.05, resulting in 30,773 loci used for DMRs (differently methylated regions) discovery.
To verify filtering and experimental design soundness, a principal component analysis (PCA) was carried out (Fig. 1), obtaining a separate clustering of the bulks belonging to the same time (T0, T30, and T90) with the first two dimensions absorbing more than 40% of the total variance.The correct separation of libraries can be appraised also in the dendrogram shown in Supplementary Fig. S1.

DMR analyses
The DMRs were identified starting from the DMPs of pairwise comparisons between the three-time points through the iterative procedure previously described.Results are reported in Supplementary Tables S1, S2, and S3.
The genomic coordinates of the DMRs were intersected with the Ensembl annotation, obtaining a total of 733 genes involved.Table 3 gives each comparison's number of genes and the methylation state.In Supplementary Table S4, this information is available for single genes comprising the DMR site within the gene (gene body or regulatory region).
Using all comparisons, we tried to isolate a group of genes, hereafter named "early response", that could be influenced by training already at T30 (early stages of training); "mid response", conversely, are those genes that are not affected at T30, but in the following stages.A Venn diagram clarifying the rationale behind this diversification  www.nature.com/scientificreports/ is reported in Supplementary Figure S2.The genes present in the T90-T0 comparison, which are neither found in mid response nor early response, can be considered the ones whose modification in methylation is present continuously throughout the whole period, increasing or decreasing progressively.

Functional analyses and gene ontology
To highlight biological processes, molecular functions, and cellular compartments mainly involved in the response to exercise and training during the different stages, a Gene Ontology (GO) enrichment analysis was carried out.The complete results of the functional analysis for differentially methylated genes from all the comparisons (T30 vs. T0, T90 vs. T0, T90 vs. T30, early and mid response) are reported in Supplementary Table S5.Table 4 lists the first fifteen most significant (lower corrected p-value) terms.While the most represented terms for early response modulated genes were related to protein regulation and cell signaling, mid response genes were mainly involved in cell communication, synaptic transmission, and regulation of transmembrane transporter activity.The genes that were differentially methylated in the whole training time frame showed enrichment in categories related to blood circulation, cardiac functions, and immune regulation.

Gene expression
To validate the MCSeEd results, 10% of the highest differentially methylated genes among time points were checked for expression in silico in peripheral blood mononuclear cells (PBMC) racehorse transcriptomes and in a human PBMC atlas.The results obtained are reported in Supplementary Table S6.Moreover, some of these genes, which were particularly interesting from the functional analysis, were tested for their gene expression through RT-qPCR (Fig. 2).All the tested genes were expressed in PBMCs.Low expression levels were found for GRID1 and LHX1, which, indeed, were not modulated during training.Moreover, the agreement between the methylation results and gene expression analysis was checked (Table 5), and results were completely in line when both tests were statistically significant.Indeed, the lack of agreement exclusively occurred when one of the two tests did not reach statistical significance.
AP5Z1 was de-methylated in T90 compared to T0 and T30, and its expression increased in the same comparisons.This gene was also down-regulated in T30 with respect to T0, although no differences in methylation levels were found, indicating a possible post-transcriptional regulation mechanism.ARFGAP2 was de-methylated in T90 compared to T30 and T0, but the reduction in gene expression was significant only in T90 vs. T30.Also, in this case, a downregulation in gene expression was detected in T30 compared to T0, while no modulation in methylation was statistically significant.For EVI5, the de-methylation in T90 compared to T0 and T30 did not result in significant gene expression modifications, nor did the methylation in T90 vs. T0 for RADIL.Moreover, the expression of RADIL was lower in T30 compared to T0, although without differences in methylation, while the de-methylation in T90 vs. T30 matched with the gene expression increase.The RGS19 and ATP2A3 methylation in T30 compared to T0 generated a decrease in gene expression, as expected, while the up-regulation of the same genes in T90 vs. T30 was probably not induced by a de-methylation which was not statistically supported, as well as for TTC7A down-regulation in T30 vs. T0.Moreover, TTC7A was up-regulated in T90 compared to T30 and de-methylated in the same comparison, concordantly.As expected, the de-methylation of SCAND1 in T90 (either vs. T0 or T30) resulted in an increased expression in T90 (compared to T0).For the low-expressed genes LHX1 and GRID1, the modulations in methylation observed were not followed by differences in gene expression, probably for the low basal expression due to other regulatory mechanisms.

Discussion
Adaptation to exercise and training triggers various responses mediated by several molecular signaling events with complex spatial and temporal interactions involving the genome at different levels based on the stimulus's intensity, duration, and frequency.One aspect of this fine machinery is the epigenetic regulation, known to modulate gene expression with no sequence variation and to some extent heritable 23 .
Emerging evidence shows that physical activity influences DNA methylation in humans 24 and that trainability depends not only on the genetic code but also on epigenetic signals such as DNA methylation and histone Considering these assumptions, our study evaluated methylation changes in the white blood cell DNA of Thoroughbred horses at their first 90 days of training, identifying differentially methylated regions (DMRs) from differentially methylated positions (DMPs) through the iterative procedure described (Table 2).A list of Ensembl-annotated genes that matched at least one DMR was produced (Supplementary Table S4).Then, the 10% of top methylated and de-methylated genes in each comparison were identified and used for Gene Ontology (GO) categories enrichment analysis.For these genes, the expression in the same cells in which methylation was assessed was investigated through RT-qPCR.Results showed that all chosen genes were expressed in blood, and their modulation mostly agrees with methylation status, confirming that MCSeEd is a valid approach to exploring genome-wide epigenomic changes (Table 5).

Differentially methylated genes
Studies in humans and animal species (rat and horse) 8,26 report that most genes involved in this regulation are mainly ascribed to muscle growth and differentiation, innervation, and synaptic-related functions.In contrast, metabolic regulation is engaged to a lesser extent.
In our system, the GO enrichment analysis (Table 4, Supplementary Table S5) of the differently methylated genes showed cell signaling proteins, ion channels, innervation and synaptic transmission, heart conduction, and contraction as the most represented terms.It is worth noting that muscle growth does not depend only on the development and differentiation of muscle cells, which is necessary for hypertrophic and/or reparative processes induced by exercise, but also requires the innervation of motor neurons and the proliferation of blood vessels 8 .

"Early response" to exercise
The first training comparison (T30 vs. T0) (Table 2) revealed a prevalence of methylated regions (103 DMRs methylated vs. 23 de-methylated) that, once crossed with the Ensembl annotation pinpointed, respectively, 76 and 10 genes (Supplementary Tables S4 and S6).Considering the early response gene subset, thus excluding those in common with T90-T0 comparison (Supplementary Fig. S2), the prevalence of methylation is even more evident (64 methylated vs. 6 de-methylated genes).This is in agreement with what observed in humans: exercise and training initially lead to increased methylation of specific genes, especially those related to transcriptional activity, while if the training exceeds eight weeks, a general de-methylation is observed, especially for those genes related to metabolic and actin-cytoskeleton pathways 27 .

Functional analyses
Concerning the GO enrichment analyses of the early response genes, GO terms related to protein regulation and cell signaling were found (e.g., "protein maturation", "maintenance of protein location growth and development and cell signaling", "negative regulation of binding", "negative regulation of transmembrane receptor protein serine/threonine kinase signaling pathway"; Table 4, Supplementary Table S5).
If we take into account the whole gene set in T30-T0 comparison, additional GO terms, such as "regulation of blood circulation", "cardiac conduction", "regulation of heart contraction", and "immunological synapse", were Table 5. Agreement between MCSeEd and qRT-PCR results.Statistical significance of modulated methylation (de-methylated-methylated) or gene expression (up-regulated-down-regulated) is reported among the comparisons (T30 vs. T0, T90 vs. T0 and T90 vs. T30).N.S.: not statistically supported.The "Agreement" column was compiled only when both analysis outcomes were statistically supported.www.nature.com/scientificreports/statistically significant.Genes described by these categories are typically ascribed to other organs, but they can be expressed in blood cells and possibly with specific functions related to exercise.

Gene-expression analyses
Most early response differentially methylated genes were also differentially expressed in RT-qPCR experiments and related to exercise-activated pathways.In detail, for example, a gene also known to be methylated as a function of insulin and glucose levels in human athlete 28 , calcium pump ATP2A3, was found in several enriched categories and among methylated genes (Supplementary Tables S4 and S6).Moreover, as seen in RT-qPCR, the ATP2A3 methylation in T30-T0 generated a gene expression reduction (Fig. 2, Table 5).
Another gene enriched in several GO categories and methylated in the T30-T0 comparison is the regulator of G protein signaling (RGS).The members of the RGS family are molecules acting on the G protein-mediated signal as negative regulators.This gene is also expressed in blood cells and regulated by hormones, cytokines, and Ca 2+ oscillations 29 .This G protein is crucial for bone cell growth and differentiation, and this gene is frequently expressed in cardiac hypertrophy and heart attack 30 .This could mean that the stimulation of the musculoskeletal system and immune cells given by training could be driven by epigenetic mechanisms such as the methylation status modulation of key pathway genes for different target tissues.
Human data reveal no expression for the RGS family in PBMCs (Supplementary Table S6); in horses, it is expressed both at rest and immediately after the race.Moreover, in this experiment, the RGS19 gene, identified among the most methylated ones, is expressed in leukocytes and is down-regulated in T30 compared to T0, in agreement with its epigenetic regulation (Supplementary Table S4, Fig. 2).
Again, GRID1 is an interesting gene among the most methylated in early response, encoding a subunit of glutamate receptor channels.These channels mediate most of the fast excitatory synaptic transmission in the central nervous system and play key roles in the synaptic plasticity 31 .Modulation of glutaminergic synapses at the level of postsynaptic density by voluntary exercise was also highlighted in mice.Therefore the glutamatergic system could be a target of modulation through regular physical activity 32 .This finding is unexpected as no expression was found in the human or horse atlas; however, it is always expressed in our system but not modulated.
Since its function is strongly related to exercise, we can speculate that its methylation change may have a biological meaning (Supplementary Tables S4 and S6).
The LIM homeobox 1 gene (LHX1), an early response one, is involved in the locomotor apparatus regulating the differentiation of Purkinje cells which are responsible for fine control of movements 33 ; it is required for temperature resistance of the nervous system clockworks 34 , and it was recently found as a novel pancreatic islet regulator contributing to normal glucose homeostasis and glucagon-like peptide 1(Glp1) 35 .Intriguingly, in a study by Pereira et al. 36 , this gene was associated with race performance in racing Quarter Horse.Also, in this case, LHX1 does not appear to be expressed in humans or horse PBMCs, while from RT-qPCR experiments, it is expressed but not modulated across sampling time points.

"Mid response" to exercise
Focusing on the late phase after 90 days of training is also interesting.Indeed, in human athletes studies, it is shown that DNA methylation changes in response to exercise are a dynamic process activated in the early phase of gene expression; residual DNA methylation changes are retained after the training stimulus is gone, indicating that these modifications are accumulated over multiple exercise sessions 37 .
In our study, the comparison T90-T0 pinpointed 424 genes from 624 DMRs (Table 2) with 138 de-methylated genes vs. 286 methylated, resulting in an increase of the de-methylated genes with respect to the first part of training.In this comparison, all the modulated genes are considered from the beginning to the end of the training.
Of these, 205 (91 are methylated and 114 de-methylated) were classified as mid response genes (Supplementary Fig. S2).There seems to be a preference for de-methylation as training progresses, as this subset includes 91 out of 138 de-methylated genes (66%) and only 114 out of 286 methylated genes (40%).This is coherent with human athlete study findings, where a higher number of de-methylated sites was associated with hypertrophy in the muscle following a repeated stimulus of 7 weeks 13 .This indicates that changes in DNA methylation could be related to exercise training and that, as the time of training increases and optimal athletic condition approaches, de-methylation of some specific genes also increases.

Functional analyses
From the enrichment analysis on the mid response gene set, the most represented terms were "cell communication", "signal transmission", "synaptic transmission" (BinGo analyses) and "synaptic signaling", "regulation of transmembrane transporter activity" (ClueGo analyses) (Supplementary Table S5).

Gene-expression analyses
Also, among the mid response genes, some of the most de-methylated genes were RT-qPCR tested (SCAND1, RADIL, EVI5, AP5Z, TTC7A, ARFGAP2) (Supplementary Table S4).These genes encode for transcription factors, small GTPases-related genes for membrane transport processes, or other signal mediators, which fall into the GO-enriched categories for this comparison.
The SCAND1 gene, encoding for a transcription factor with DNA-binding activity, is one of the most demethylated in mid response.It has been detected as a co-expressed gene in a network analysis from muscle and neutrophil tissues: this suggests the preservation of robust gene expression profiles among a blood leukocyte subpopulation and the skeletal muscle in response to physiological stresses 38  This supports the idea that neutrophil gene networks may help track physiological and pathological changes in the muscle tissue, especially for genes that are functionally related to post-translational signaling mechanisms such as acetylation 38 .SCAND1 is expressed in human and horse PBMCs and our system with concordance between the methylation change and the relative gene expression.
RADIL gene that encodes for Ras-associated and dilutes domain-containing protein with a GTPase binding activity and is involved in the regulation of cell-cell adhesion in endothelial cells 39 , also matched its de-methylation in T90-T30 with the gene expression increase in RT-qPCR.
Also, EVI5, a product of the EVI5 gene, acts as a GTPase Activating Protein (GAP) for the Rab family GTPases that regulate membrane traffic.Moreover, the ARFGAP2 gene encoded for Arf proteins, small GTPases that are key elements of downstream signaling pathways regulating multiple effector proteins and functional responses of cells.Arf protein regulates polymorphonuclear neutrophil functions such as superoxide production, degranulation, and chemotaxis 40 in blood.Both genes are expressed in human and horse PBMCs, and in the horse buffy coat analyzed in our study.T90-T30 de-methylation of ARFGAP2 corresponded to an increased expression in RT-qPCR, while T90-T30 de-methylation of EVI5 did not correspond to a significant gene expression modulation; however, the regulation of transcription does not depend only on the state of DNA methylation.
The expression of the AP5Z1 gene, encoding a crucial protein for intracellular cell trafficking by controlling features such as cell development, signal transduction, apoptosis, and proliferation pathways 41 , is instead modulated accordingly with its methylation state in different libraries.Like the TTC7A gene (up-regulated in leukocytes of the T90 library compared to T30 and de-methylated in the same comparison) that encodes a scaffolding protein, facilitating the synthesis of PI4-phosphate (PI4P) essential for promoting transport of secretory proteins 42 .
In conclusion, in this study, we applied, for the first time, the methylation content sensitive enzyme ddRAD (MCSeEd) technique to animal blood cells and observed substantial and significant differences in methylation patterns during early training stages with insights also in mid and late training stages where more persistent stimuli are required.
The methylation changes observed are dramatically high and largely a consequence of the incrementing training, although we cannot completely rule out other environmental stressors, such as weather conditions.Some authors report, although with not so compelling results, that seasonality and mean temperature during short-, medium-, and long-term exposures can mildly modify the methylation status of a genome [43][44][45] .In our case, the THI (temperature humidity index) was monitored to evaluate its changes throughout the time span of the experiment 46 .The minimum THI for all the days was lower than 70, which is the critically recognized threshold in farm animals to pinpoint heat stress 47 .To limit the possible changes induced by high environmental temperature, both blood sampling and training were carried out early in the morning (6:30 AM), a time which is not associated with acute heat stress (48.1 < THI < 64, Table 2 in 46 ); in the hottest hours, animals were kept under indoor-temperature-controlled stables.The methylation changes associated with temperature previously observed in literature were on a different scale compared to those we found, both in terms of methylation levels and the number of implicated genes that were different and also on other functional categories; only 4 out of 446 (0.9%) genes here differentially methylated have already been associated with temperature 44,45 .Moreover, even though a change does exist, spring and summer-our monitored time frame-appear to exert similar outcomes 43 .
All considered, the environmental footprint on methylation due to temperature and humidity is very likely masked by the more impactful effect induced by training.
Our results indicate that the genome-wide approach is mandatory to fully understand and characterize a complex phenomenon, such as the response to training.
We have confirmed that genes belonging to GO categories, typically linked to other organs, can be expressed in blood cells with specific functions also related to exercise.Indeed, we have highlighted that genes-and therefore relative functions and GO categories-are far from fully characterized, especially those expressed in blood tissue that mimics the systemic response.
The results of this research could be prodromal to the characterization of the epigenetic response induced by exercise in the horse athlete, to enhance training schedules, improve performances, and better respect animal health and welfare.

Animals enrolled
For this study, 20 Thoroughbreds (8 males and 12 females) with a mean age of two years (Supplementary Table S7), clinically healthy, and never trained for flat racing were recruited.For sampling, an aliquot from the blood sampled for routine controls to assess the health of the animals during the training season, allowed by the Italian Horse Racing Board and performed by the authorized veterinary practitioner, was taken 48 .All animals were enrolled in the study after the owners' and trainers' written informed consent in compliance with the Italian Regulation D.L. 116/1992.Anyway, the animal care procedures were compliant with the European recommendations (Directive 2010/63/EU) for the protection of animals used for scientific purposes, and the study is reported in accordance with the ARRIVE guidelines.
All the horses were managed in the same stable with individual housing with natural temperature and photoperiod.The training program is summarized in Table 6 and was performed for each horse from Monday to Saturday.The two months before the training start (T0) were used to acclimate the animals.
At the end of the experimental period, all subjects participated in one or more competitions; none showed a poor performance syndrome during the study or competition period.Weather conditions (temperature and relative humidity) were monitored during the whole period of training, collecting data at a weather station 6.6 km away from the training center at noon and calculating the temperature humidity index (THI) as reported in our previous study 46 where samples of the same cohort were used.

Sampling
The sampling activity was carried out once a month, from March 2018 to July 2018, at 6:30 AM before training and feeding, while at T90, the collection was carried out before the race.The experimental period was divided into five times, thirty days apart (T-30, T0, T30, T60, and T90).March (T-30) was considered to be the month in which the animals started the light gallop; April (T0) was the first month of training with racing simulation (gallop).From April to July, the training was incremental (see details in Table 6).Blood samples were collected from the jugular vein in Vacutainer tubes (10 ml; Terumo Corporation, BD; Tokyo, Japan) with EDTA.Samples at times T0, T30, and T90 were used to evaluate DNA methylation changes.

DNA purification, library construction, and sequencing
The genomic DNA was extracted from the buffy coat with GenElute ™ Mammalian Genomic DNA Miniprep kit (Sigma-Aldrich, St. Louis, Missouri, USA) and subsequently quantified with a NanoDrop2000 spectrophotometer (Thermo Fisher Scientific, Waltham, MA, USA) and Qubit fluorometer (Thermo Fisher Scientific, Waltham, MA, USA).Successively samples were grouped into pools of 4 subjects at each time point of sampling, obtaining five different bulks (Fig. 3, Supplementary Table S8).We proceeded with bulks to mitigate individual genetic variation and reduce costs.
The library set-up protocol was performed according to Marconi et al. 21with some modifications, as described below.Before digestion, five methylation enzymes were evaluated for the best restriction profile for CG methylation context: AciI, AclI, AgeI, BsrFI, and BstBI, with MseI as a companion enzyme.To define the expected restriction pattern of an enzyme combination, the genome was scanned in silico, calculating the size distribution of the restriction fragments with the optimal length range (200-700 bp) effectively captured by MCSeEd genomic libraries.AciI was chosen since it presented the highest number of suitable fragments (Supplementary Table S9).
For each library, 150 ng of DNA were double-digested with a combination of the two enzymes (5U of AciI and 5 U of MseI restriction enzymes, New England BioLabs, Ipswich, US), in the presence of 2 μM of unique barcoded adapter, 2 μM of unique common Y adapter, 1 U of T4 DNA ligase (Thermo Fisher), 0.2 mM ATP and 1 × RL buffer (5 × CutSmart Buffer, New England Biolabs, 25 mM DTT, Invitrogen) for a final volume of 50 μL.
The libraries were then pooled, as reported in the sequencing experimental design (Supplementary Table S8, purified using magnetic beads (Agencourt AMPure XP; Beckman Coulter, MA, USA), size selected by gel electrophoresis and purified using QIAquick Gel Extraction kits (Qiagen, Aarhus, DK) for fragments in the range   of 200 bp to 700 bp.Size-selected libraries were quantified using a fluorometer (Qubit; Agilent Life Technologies, Santa Cruz.CA, USA), and a normalized DNA amount (15 ng) was amplified with a primer that introduced an Illumina Index (at the Y common adapter site) for demultiplexing.Following PCR with uniquely indexed primers, multiple samples were pooled.PCR enrichment was performed as previously described 21 .Amplified libraries were purified with magnetic beads (AMPure; Beckman Coulter, Brea, CA, USA) and then quantified (Qubit and Bioanalyzer 2100: Agilent Life Technologies).The grouped libraries were pooled in equimolar amounts, and the final library was Illumina-sequenced using 150-bp single-end chemistry.

Bioinformatic analyses
Briefly, the MCSeEd pipeline follows this rationale.When a reference genome is available 49 , sequences are mapped on it, and alignment coordinates are saved in an "experiment-wise annotation" that allows creating a count matrix where rows are methylation-affected loci and columns are samples.The count file is then filtered and processed to ultimately obtain the differentially methylated positions (DMPs), according to the following steps: (i) standardization of libraries; (ii) filtering based on the coverage with at least ten reads per locus; (iii) calculation of the relative levels of methylation in each locus; (iv) parsing of datasets for methylKit R package usage 50 .In-depth details on the technical procedure are available in Marconi et al. 21.
DMPs were therefore identified as sites that showed significant differences in the methylation levels between the treatments, using logistic regression as implemented in methylKit.The DMPs were called following the methylKit manual best practices.These positions were used to calculate the window within is possible the identification of a differentially methylated region (DMR) where at least two DMPs concordant (de-methylated, methylated) and statistically significant (FDR < 0.05) rely on.To achieve optimal DMR identifications, we maximized the number of DMRs in a series of adjacent sliding windows with an iterative procedure considering different window lengths (100 bp to 2000 bp with 100 bp pace).

Differentially methylated genes
To obtain a gene list with methylation level modulation during training, we produced a BED format file of our DMRs that served as input, together with the complete annotation of EquCab3, Ensembl 101, downloaded from Biomart (http:// www.ensem bl.org/ bioma rt/), to the bedtools utility (intersect function).Both the gene body and regulatory regions, 2.5 kb upstream of the transcription start sites (TSSs) for strand + genes and 2.5 kb downstream of the transcription termination site (TTS) for strand-genes, were considered for the intersection.

Enrichment analyses
The gene list from the previous step was the base information for an enrichment analysis using Gene Ontology (GO) categories via BinGO 51 and ClueGO 52 , both implemented in the Cytoscape suite 53 .These two pieces of software use different algorithms to reveal the enrichment of the GO terms, and we decided to pursue a conservative approach with a "consensus analysis"; only GO terms belonging to the categories "biological processes", "molecular functions", and "cellular compartments" statistically significant (FDR < 0.05) in both analyses were considered and discussed.

Gene expression assessment
To validate methylation status/effects, the most modulated genes found with the highest differences in methylation among time points were selected (10% most methylated and 10% most de-methylated within each comparison).The expression levels of these genes were checked in our RNA-seq data of peripheral blood mononuclear cells (PBMCs) of racehorses at rest and after gallop race 54 and, to further consolidate our findings, in a human PBMC gene expression atlas (EMBL-EBI Expression Atlas).
Moreover, the relative gene expression of some of these genes (see list in Supplementary Table S10) was assessed through RT-qPCR.Total RNA was extracted from the buffy coat of the 20 horses for T0, T30, and T90, using Invitrogen™ TRIzol™ Plus RNA Purification Kit (Thermo Fisher Scientific, Waltham, MA, USA), following the manufacturer's instructions.RNA extraction was assessed using a NanoDrop2000 Spectrophotometer (Thermo Fisher Scientific, Waltham, MA, USA).As for methylation assessment, samples were pooled into five bulks, each containing the RNA of the same four horses mixed for DNA.
The same amount of RNA for each bulk (1 µg) was reverse-transcribed using the SuperScript® VILO IV TM Master Mix (Thermo Fisher Scientific, Waltham, MA, USA), following the manufacturer's guidelines.The cDNA was diluted in nuclease-free water 1:10, and the amplification was performed on CFX96™ Real-Time System (Bio-Rad, Milan, Italy) following a protocol developed in previous studies 55 .The succinate dehydrogenase complex flavoprotein subunit A (SDHA) and hypoxanthine phosphoribosyltransferase 1 (HPRT) are optimal reference genes for blood cells in horses and were used to normalize target gene expression levels 56,57 .Target gene primers were designed through the Primer3 online platform (https:// prime r3.ut.ee, accessed on 30th October 2023) and placed in different exons or at exon-exon junctions to avoid biases due to genomic DNA amplification.Supplementary Table S10 reports primer pair sequences for tested genes: SCAN domain containing 1 (SCAND1), glutamate ionotropic receptor delta type subunit 1 (GRID1), LIM homeobox 1 (LHX1), regulator of G protein signaling 19 (RGS19), rap associating with DIL domain (RADIL), ecotropic viral integration site 5 (EVI5), adaptor

Figure 1 .
Figure 1.Plot of components 1 and 2 identified by the PCA analysis.Clear clustering of samples in homogeneous groups for time points is evident.Sample groups are indicated in the legend.Correct library separation can also be appraised in the dendrogram shown in Supplementary Fig. S1.

Figure 3 .
Figure 3. Scheme of the experimental design from sampling to obtain bulks and respective libraries, sequencing, and bioinformatics analysis.

Table 1 .
Sequencing and mapping statistics per sample.

Table 2 .
Number of DMPs and DMRs in the three comparisons with relative modulation.

Table 3 .
Number of differentially methylated genes among the three comparisons, with relative quantities of de-methylated and methylated genes.

Table 4 .
The first fifteen enriched GO terms (lowest adjusted p-value by Bonferroni correction) resulted from the ClueGO algorithm for differentially methylated genes during early response, mid response, and whole period.
Vol:.(1234567890) Scientific Reports | (2023) 13:18786 | https://doi.org/10.1038/s41598-023-46043-wwww.nature.com/scientificreports/modifications.The most accessible tissue where epigenetic modifications (especially global methylation profile) can be investigated is the blood.However, many challenges must be faced because patterns change rapidly depending on the intensity and duration of the exercise 25 .Investigating methylation changes during a long training period, especially in never-trained athletes, could give more consistent results than comparing subjects of different ages and training.

Table 6 .
Standard daily training program completed by each horse involved in the study.