Multivariate modelling of milk fatty acid profile to discriminate the forages in dairy cows’ ration

Although there are many studies on the importance of fatty acids (FA) in our diet and on the influence of dairy diets on FA metabolism, only a few investigate their predictive capacity to discriminate the type, amount and conservation method of farm forages. This research quantifies differences in milk FA concentrations and, using a supervised factorial discriminant analysis, assesses potential biomarkers when replacing maize with other silages, grass/lucerne hays or fresh grass. The statistical modelling identified three main clusters of milk FA profiles associated with silages, hays and fresh grass as dominant roughages. The main implication of a dairy cow feeding system based on poliphytic forages from permanent meadows is enhancing milk’s nutritional quality due to an increase in beneficial omega-3 polyunsaturated FA, conjugated linoleic acids and odd chain FA, compared to feeding maize silage. The study also identified a small but powerful and reliable pool of milk FA that can act as biomarkers to authenticate feeding systems: C16:1 c-9, C17:0, C18:0, C18:3 c-9, c-12, c-15, C18:1 c-9, C18:1 t-11 and C20:0.

supervised modelling, can determine a functional relationship between the analytes (i.e., FA) and the predictors (i.e., forage type), identifying useful features which discriminate between classes 21,22 .
This research quantifies differences in milk FA profiles from replacing maize silage with (a) silages from other cereal or legume crops, (b) grass and lucerne hays or (c) fresh grass, using supervised factorial discriminant analysis to verify if differences in FA can be used to fingerprint dairy production chains. Moreover, a linear regression model and clustering of the variability by a set of descriptive statistics were performed to predict milk FA profile in relation to dietary forage.

Materials and methods
Ethical statement and experimental design. The study did not influence farm activities or management strategies and did not involve invasive procedures or manipulation of lacating dairy cows. Since the impact on the animals' welfare was negligible, ethical review and approval from the local or national ethics committee was unnecessary. With farmer consent, a qualified veterinarian collected records, feed (pre and post feeding) and bulk tank milk samples from 14 commercial dairy farms in the middle of the Italian lowland area, Po Valley (North East of Italy, 45° 19′ 49″ N 9° 47′ 56″ E). The farms were selected to represent average herd size and milk yield characterizing the Italian intensive dairy system 23,24 and all were affiliated to Regional Breeders' Association, ensuring descriptive characteristics were recorded monthly over the experimental period (Table 1). Records were collected on diet details and milk production at each sampling visit (5 per farm) and averaged per lactating cow per day. On this basis, the average dry matter intake (DMI) was calculated by difference between amount of total mixed rations (TMR) distributed to the lactating cows and refusals after 24 h or before the subsequent distribution. Milk production was standardised to 'Fat Protein Corrected Milk' (FPCM) as per International Dairy Federation (IDF) 25 using the following equation: where Y, milk yield as kg/day; F, fat as percentage; TP, true protein as percentage (= CP × 0.93, and CP as N Kjeldahl × 6.38); 0.0929, 0.0588 and 0.7576 are Mcal/kg of F, TP and standardized milk (4.0% F and 3.3% TP), respectively.
Sample collection and chemical analysis. In 2018, five raw bulk milk samples were collected from each farm in March, May, July, September and December (n = 70) and at each visit, the current TMR was also sampled and formulations recorded. Since it is not uncommon for farms to alter diet ingredients due to seasonal feed supply, some farms changed TMR formulation over the experimental period, essentially changing group.
In details: HMS covered 4 farms and n records = 20; MMS, 4 farms and n = 18 (because one original MMS farm changed twice, once into MCS and once into HAY); MCS, 2 farms and n = 11 (because one switched from MMS); HAY, 2 farms and n = 12 (because one switched from MCS and another one from HAY); GRG, 2 farms and n = 9 (because one orginal GRG farm changed once into a HAY diet). However, according to Rego et al. 14 30 ) and ANKOM technology for neutral detergent fiber (aNDF) 31 and acid detergent fiber (ADF) 32 . The milk proximate composition (crude protein, casein, fat, lactose) and chemical traits (urea, pH) were recorded by a Fourier transform mid-infrared (FT-MIR) spectroscopy technique using a MilkoScan FT6000 (Foss Electric A/S, Hillerød, Denmark). Additionally, the somatic cell count (SCC, 100,000/ml) was performed by a Fossomatic 5000 (Foss Electric A/S, Hillerød, Denmark) and reported as SCC score calculated with the following formula [log2 (SCC/100,000) + 3].
For milk FA analysis, 2 replicates of approximately 35 g from each sample were freeze-dried, mixed to a fine homogenous powder and transferred to suitable vials. These lyophilized samples were methylated and esterified to prepare for gas chromatography (GC), as described by Chilliard et al. 33 and Stergiadis et al. 34 . The chemicals used for extraction of FA, correction for SCFA, analytical standards and identification of peaks followed the methodology of Stergiadis et al. 35 . To optimize peak separation, modifications to the chromatographic conditions from the original method by Chilliard et al. 33 was followed, as reported by Stergiadis et al. 35 . FA results are expressed as g/100 g of total quantified FA. Values for individual FA were used to calculate total saturated FA (SFA), short chain (≤ C10) FA (SCFA), monounsaturated FA (MUFA), polyunsaturated FA (PUFA), conjugated linoleic acids (CLA), highly unsaturated (≥ 4 double bonds) FA (HUFA), odd chain FA (OCFA), n-3 (omega-3 FA), n-6 (omega-6 FA), HUFAn-3 as well as n-3:n-6 and n-6:n-3 ratio.

Statistical analysis.
All analyses were carried out using the SAS 9.4 software (SAS Institute Inc., Cary, NC, USA) and XLStat (Addinsoft, release 2016, New York, USA). Herd performances (DMI and milk production expressed per cow per day) and raw bulk milk chemical and FA profile data were analysed using a linear mixed model that included the fixed effects of feeding group (FG: i-v) and the random effect of the farm (SAS PROC MIXED). Pairwise comparisons among levels of the FG factor were performed using Bonferroni correction. The hypotheses of the linear model on the residuals were graphically assessed.
The dataset of FA profiles was subjected to supervised multivariate factorial discriminant analysis (FDA), considering the FG as the predictor factor. The FDA split the total variance in four main canonical functions; F1-F4. The outcomes of the FDA were plotted to classify the five FG according to the first two main canonical functions F1 and F2. The correlation coefficients (with absolute value greater than 0.20) between the original FA and F1 and/or F2 were also plotted in the FDA-scattergram. The reliability of the FDA classification model was assessed by a leave one out cross-validation (SAS PROC DISCRIM). A confusion matrix was built throughout the results of the procedure and the classification performance was assessed using accuracy, precision, sensitivity, specificity and Matthews correlation coefficient (MCC) 36 .
A multiple stepwise regressions (SAS PROC REG) were preformed on the four main forages types (maize silage, mixed crop silages, grass and lucerne hays, fresh grass) on some FA and their derived chemical classes (SFA, MUFA, PUFA, CLA, HUFAn-3, OCFA). The regression coefficients were estimated. The most discriminative FA selected by the FDA were graphically represented by some box-whisker plots across the five FG.

Results
Dairy farm description, herd performance and milk quality. Mean herd characteristics of the five FG are reported in Table 1 showing major differences in herd size-farms using maize silage milked more cows than farms feeding dried or fresh grass/legume forage. Table 2 shows there were also significant differences between FG for both daily milk yield per cow and DMI (likely to be linked) as well as some aspects of proximate composition. The lowest daily FPCM yield was recorded for the GRG group, which also showed significantly lowest CP, casein and lactose concentrations. www.nature.com/scientificreports/ Fatty acid profile. Table 3 reports mean concentration for each FG of abundant FA and those expected to be influenced by roughage sources although results for all 74 profiled FA are reported in a supplementary table.
The FA profiles differed between FG for VA, C18:2 c-9, t-11 (CLA9), C20:5 c-5, c-8, c-11, c-14, c-17 (EPA), total CLA concentrations; all being higher (p < 0.05) for GRG than HMS milk and concentrations of SFA and SCFA were lower (p < 0.05). Differences also reached significance in comparing GRG and MMS milk for CLA9, total CLA and SCFA whereas for CLA9, GRG milk was significantly higher than for all other groups except HAY and SFA concentrations were lower than all groups except MMS. Differences were also significant in comparing GRG with MCS and HAY milk, where C16:0 (palmitic acid, PA) was lowest and PUFA concentrations highest in GRG milk. Other differences also existed when comparing HAY milk with the other groups; linoleic acid (LA, C18:2 c9, c12) had a tendancy (p = 0.066) to be lower than in GRG milk, and milk from the HAY group had more (p < 0.05) alpha linolenic acid (ALA, C18:3 c9, c12, c15) and total n-3 than HMS milk, resulting in a lower n-6:n-3 ratio (p < 0.05).
Factorial discriminant analysis. The factorial discriminant analysis (FDA) resulted in two main significant functions (F1 and F2; Wilks's λ = 0.002), accounting for 59.0% and 20.1% of the total variance, respectively. The FDA identified the 9 most significantly (p < 0.05) discriminative FA: C9:0, C10:0, C16:1 c-9, C17:0, C17:1 c-9, stearic acid, (SA, C18:0), ALA, CLA 9 and C20:0; all of which have a correlation coefficient in absolute value greater than 0.25 with F1 and/or F2. Figure 1 shows a FDA scattergram with these discriminative FA against F1 (x-axis) and F2 (y-axis). The FA contributing most to differentiate FG under FDA were poorly aligned with the those identified to differ by the univariate analysis; indeed, only ALA and CLA9 proved to be significant under both analyses. As reported in Fig. 1, the GRG and HAY milk FA profiles clearly differ from silage-based diet profiles and between each other, although there are considerable overlaps among HMS, MMS and MCS samples. HMS and MCS seemed similar and only partially overlap with the MMS group. The cross-validation used to assess FDA reliability confirmed the accuracy of this supervised targeted model for the correct classification of milk from HAY and GRG groups (Matthews correlation coefficient values of 1.00), however, there was a noticeable misclassification among the silage-derived milk samples, especially for MCS, with 5 out of 11 samples wrongly assigned to HMS (Table 4). However, if all silage samples were considered as single cluster, as suggested by the FDA, the predictive performances is enhanced.

Prediction of milk FA composition.
The results from the multiple linear regressions using the most predictive FA for the four forage sources (maize silage, other silages, hays, fresh grass) are reported in Table 5. Indeed, although both silages (maize and 'others') slightly influence individual FA concentrations, they significantly increased total SFA and, consequently, reduced PUFA, especially ALA and CLA9. Hay feeding is positively correlated with C17:0 and CLA9 and negatively with SA and concentrations, resulting in higher PUFA. Hay also seems to both decrease LA and increase SFA, even though the effect on SFA is weaker than with silages, especially those from mixed-crops. Feeding fresh grass seems to modify the FA profile mildly, even if it too contributes to higher concentrations of two beneficial FA-C17:0 and CLA9. In the case of OCFA there were no significant predictive capacity by any of the roughage sources. Table 2. Effect of dietary roughage source (forage group) on DM intake (DMI), milk production, composition and quality traits. FPCM, fat protein corrected milk (4.0% fat and 3.3% true protein); SCC, somatic cell count as log 2 (SCC/100,000) + 3; HMS, high maize silage; MMS, medium maize silage; MCS, mixed crop silages; HAY, grass and lucerne hays; GRG , green grass; SEM, standard error of the means. a-b LSMeans in a row without a common superscript differ (p < 0.05). www.nature.com/scientificreports/   www.nature.com/scientificreports/

Discussion
The study was designed to model milk FA responses, in high genetic merit dairy cows under lowland field conditions. Univariate statistical analysis assessed the main differences in milk production, quality and FA profile between the feeding groups, then a supervised factorial discriminant modelling identified the FA fingerprinting of the forage groups and finally, the multiple linear regression equations verified the magnitude of the relationships between discriminant FA and forage types. As expected, the herds' productive performance were closely linked to feed consumption or DMI, although the potential production of herds feeding maize-silage might have been strategically limited to enhance milk quality and qualify the farms' destination to protected designation of origin (PDO) for hard cheese production. The lower milk CP, particularly casein content, in GRG samples may be due to an inbalance in ruminal degradability for highly fermentable fibre and N-sources, typically seen for leafy grass consumption. Moreover, the low CP and casein in GRG-milk could also be due polyphenol oxidase activity (from red clover and other legumes) reducing protein degradability in the rumen, and hence amino acid supply, although this is speculative 37 . The lower lactose content in milk from GRG cows, could be due to a combination of lower concentrate supplementation and total feed intake, potentially leading to lower rumen propionate synthesis compared with other groups 22 . Since propionate is the main precursor of ruminant gluconeogenesis, this can lead to a less glucose and hence lactose synthesis. Another explanation might relate to milk yield 38 , as higher milk intermarry pressure (for the other forage groups), can increase milk lactose concentration. In contrast to other studies 12,39 , total milk fat was similar across all forage types, probably a reflection lack of variation in the concentration of antilipogenic FA, such as the rumen intermediate trans C18:1 isomers like C18:1 t-10. www.nature.com/scientificreports/ Major differences between the FG existed for the concentration of most nutritionally relevant FA, driven by the amount and type of forage in the diets. These often reached significance between GRG and HMS milk, although in some cases GRG milk also differed from MMS, MCS and/or HAY groups with no consistency in the pattern of variation seen for the individual FA. One noticeable outcome from our study is a significant (p < 0.05) effect of hay and fresh-grass feeding on the concentrations of total CLA, CLA9 and its precursor VA in milk (especially if compared to HMS), in line with previous studies 9, 13,16 . Indeed, dairy diets based on fresh grass have led to significantly higher concentrations in VA, CLA9 and total CLA 40 . Many studies report that forage from permanent meadows produces milk with more CLA and n-3 than from maize or other cereal silage diets [41][42][43] , as in our survey. These elevated concentrations of CLA are likely to be due to polyphitic forage, rich in LA and ALA, which undergo incomplete hydrogenation, generating the intermediate VA rather SA (C18:0). Both hydrogenation products (VA and SA) are subsequently desaturated in the mammary gland producing CLA9 and oleic acid (OA, C18:1 c-9) respectively into milk 44 . As in this study, Akbaridoust et al. 12 , confirmed lower milk CLA9 concentrations from partially replacing lowland grazing with maize silage. Both HAY and GRG forages in this study originated as polyphitic vegetation from permanent meadow also leading to higher concentrations of ALA, EPA and n-3 compared with other diets, although differences between GRG and other groups do not always reach significance. Compared to grass only silage diets, the inclusion of red clover silage in TMR-fed cows has led to significantly more n-3 in milk, especially as ALA 45 .
Feeding maize silage as the dominant roughage in this study caused higher concentrations of SCFA (C4:0 and C6:0) and PA, increasing total SFA in milk compared with other FG. Maize silage also elevated n-6:n-3 ratios (mostly driven by lower n-3 rather than more n-6) which is common with other studies 10,19 , however not   Table 5. Multiple linear regression equations of the most discriminant milk fatty acids (FA) and their derived chemical classes based on the dietary forage sources (% DM). Each equation (data of FA are as g/100 g of total detected fatty acids) is presented in the following format: intercept of the model (± standard error) and regression coefficient of the forages, when significant (*p < 0.05; † p < 0.10; ns = p > 0.10). The p value refers to the significance of the regression model. LA = linoleic acid, C18:2 c-9, c-12; ALA = alpha linolenic acid, C18:3 c-9, c-12, c-15; CLA9 = C18:2 c-9, t-11. For the other fatty acids abbreviations see Table 3. www.nature.com/scientificreports/ all findings here are in line with previous reports. With respect to C4:0 and C6:0, Yang et al. 46 , report increasing maize silage reduced their concentrations although another Coppa et al. 47 , highlighted increased de novosynthesized FA (from C4:0 to C14:0) with reduced consumption of fresh grass (replaced with maize silage) in the cows' diet. Moreover, this latter study observed a significant increase in the proportion of C8:0 to C12:0 with higher inclusion of maize silage and concentrates in the TMR. Other studies also report greater secretion of PA in milk 13,14,18 with increasing inclusion of maize silage, although Liu et al. report this major SFA was not influenced by the relative proportions of maize and grass silage in a broad study under natural uncontrolled conditions 21 . Short chain and some medium chain FA are mainly produced by de novo synthesis in the mammary gland, using acetate and butyrate from the ruminal fibrolytic bacteria activity, although some (particularly PA) can be derived directly from the diet, especially with greater use of maize silage 19,48 . With the exception of the HAY group, the NDF content (an indication of digestible fibre) of all diets tended to be remarkably consistent, with mean levels ranging from 36.8 to 37.7%. This might explain why SCFA were not lower with maize silage although does not explain the apparent slightly higher de novo synthesis compared with other forages. Milk SCFA content has been demonstrated to related to grass botanical origin rather than dietary NDF content per se, confirming that mammary de novo FA synthesis could be affected by the proportion of unsaturared FA, probably ruminal biohydrogenation intermediates 39,49 . This study aimed to evaluate the influence of the feeding system on the lipidic fingerprinting in milk, considering TMR diets with five main roughage sources. Thus, a factorial discriminant analysis (FDA) was carried on the 70 milk FA profiles to identify changes when maize silage was replaced with a mix of other ensiled, dried or fresh forages. Figure 1 confirms HAY-milk samples correlate with ALA (C18:3 c-9, c-12, c-15 on the chart), proving once more to be a specific strong biomarker of hay-based diets 20,21 , with a minor contribution of the long chain SFA C20:0. On the other hand, GRG milk seemed to be characterized by more CLA9 (C18:2 c-9, t-11 on the chart) and C17:0, even if these FA also discriminate HAY samples. Both FA have previously been identified as biomarkers of fresh grass-based milk by Butler et al. 40 and Paredes et al. 16 , respectively. As discussed, the discriminative capacity of CLA9 is likely to be due to the incomplete rumen hydrogenation of dietary LA and ALA and subsequent desaturation in the mammary gland. The odd chain FA C17:0 is derived largely from the rumen microbial activity and its transfer into milk is reported to be enhanced for cows fed hays and fresh grass rich in C18 FA 16,17 -similar to grasses and legume species fed to the HAY and GRG cows in our study. C16:1 c-9 was identified as a weak lipidic biomarker of MCS and HMS milk samples, with only minor discriminative capacity, slightly correlated with F1 and, as with C17:1 c-9, it appears associated with both MCS and GRG milk. However, explaining their discriminative roles is not easy. From the literature, C16:1 c-9 seems to indicate both the use of maize-based diets 18 and the adoption high concentrates diets 20 , whereas C17:1 c-9 has been reported to be associated with fresh grass feeding 18 and both are the result of Δ9-desaturation (of C16:0 and C17:0) in the mammary gland. Milk from the three FG feeding silages tended to have similar FDA loadings making them spatially overlapping as a single cluster in the left-centre of the scattergram, associated with C9:0, C10:0 and SA. However, MMS group is slightly separated from the other two because of the influence of C10:0 and C18:0 (SA). Other studies report, compared to rations based on a fresh grass, feeding highly digestible silages, seemed to increase the proportion of SFA, such as C10:0 19,20 and SA 12 , due to a more extensive ruminal biohydrogenation. Although univariate analysis did not detect any significant difference between FG for SA (C18:0) (Table 3), the strong correlation with MMS could be explained by the highest milk production by this group, possibly causing a slightly negative energy balance and release of SA from mobilized body fat 50 .

Fatty acids Intercept
Extending the findings of the multivariate discriminant model to the large-scale distribution maybe effective to identify dairy products based on, at least, the three feeding strategies investigated in the present study: ensiled (HMS, MMS, MCS) vs. dried (HAY) vs. fresh (GRG) forages; even if they are all produced in the same geographic area. Thus proving the effective role of FA profile to trace the dairy products according to feeding system. Indeed, milk FA profile can be a powerful, reliable and accurate metabolomics tool to discriminate production rations high in cereal-derived silages or a mix of grass and legume-derived hays, which affect the nutritional value of the resulting milk (incidence of beneficial FA), contamination risk (i.e., presence of clostridium bacteria) and the sustainability of system (ratio between input and output of human edible energy).
The findings discussed already are mostly confirmed by the stepwise regression models. Indeed, although both maize and others silages slightly influence individual FA, they significantly increased total SFA and, consequently, reduced PUFA, especially for ALA and CLA9 concentrations. An overview of the predictive regression results seemed to highlight the main consequence of replacing maize or other silages with hay or fresh grass is the increase in PUFA beneficial for human health especially CLA and ALA. However, Fig. 2 also shows how variable the concentrations of ALA and total PUFA in milk from the HAY group are, probably because of the range in botanical composition and forage maturity at harvest across the farms throughout the study. Furthermore, feeding silages, especially from cereals other than maize, seem to increase the SFA content of milk, more than hay does, shown by their higher regression coefficients. In contrast to the variability within the HAY and GRG groups, Fig. 2 shows silage-based diets to be more uniform. The feeding system with the greatest effect on milk FA composition is GRG; increasing beneficial FA, such as CLA and C17:0 probably due to the contrasting impact of fresh grass and fermented silages on the rumen activities. As discussed, variability in forages within GRG records might explain outliers within this group (Fig. 2).
Since milk lipid composition depends on a combination of feeding, genetics, stage of lactation and seasonal variation 51 , accurate prediction of FA profile from these forage types is challenging and hard to interpret due to variability in botanical origin, maturation stage and conservation method, not to mention feeding management 41 . However, this study underlines the support that can come from multivariate approaches to predict milk lipidic nutrients, which can be bigger than that coming from univariate models based on farming variables. Further work in this area ought to allow a comprehensive whole system modelling of milk FA origin, accounting for  Figure 2. Box-Whisker plots of fatty acids (g/100 g of total fatty acids) according to the five feeding groups. HMS, high maize silage; MMS, medium maize silage; MCS, mixed crop silages; HAY, grass and lucerne hays; GRG , green grass. The box plots represent the following descriptive statistics: median (bar in box), mean (+, red cross), 25% (Q1) and 75% (Q3) quartile (bottom and top end of the box), minimum [Q1 − 1.5 × (Q3 − Q1)] and maximum [Q1 + 1.5 × (Q3 − Q1)] whiskers (lines outside), minimum and maximum values (•, full black circles), outliers with a distance to box of 1.5-3.0 times interquartile range (°, empty circles) or higher than 3 times interquartile range (*, asterisks). For the fatty acids abbreviations see Table 3. www.nature.com/scientificreports/