Estimation of milk yield based on udder measures of Pelibuey sheep using artificial neural networks

Udder measures have been used to assess milk yield of sheep through classical methods of estimation. Artificial neural networks (ANN) can deal with complex non-linear relationships between input and output variables. In the current study, ANN were applied to udder measures from Pelibuey ewes to estimate their milk yield and this was compared with linear regression. A total of 357 milk yield records with its corresponding udder measures were used. A supervised learning was used to train and teach the network using a two-layer ANN with seven hidden structures. The globally convergent algorithm based on the resilient backpropagation was used to calculate ANN. Goodness of fit was evaluated using the mean square prediction error (MSPE), root MSPE (RMSPE), correlation coefficient (r), Bayesian’s Information Criterion (BIC), Akaike’s Information Criterion (AIC) and accuracy. The 15–15 ANN architecture showed that the best predictive milk yield performance achieved an accuracy of 97.9% and the highest values of r2 (0.93), and the lowest values of MSPE (0.0023), RMSPE (0.04), AIC (− 2088.81) and BIC (− 2069.56). The study revealed that ANN is a powerful tool to estimate milk yield when udder measures are used as input variables and showed better goodness of fit in comparison with classical regression methods.

Traditionally sheep rearing in tropical regions of Latin America is focused on lamb production. However, recently, farmers have increased their interest on diversifying sheep products. Pelibuey is a breed used mainly for meat production as it has maternal ability and prolificity, and over the last decade its milk production performance has been explored 1 . Milk production and milk composition from Pelibuey sheep has been analyzed as pure breed 2 and as a crossbred with specialized dairy breeds 1 .The improvement of milk production in this breed requires milk yield recording and in order to favor an adequate productive performance, determination of udder morphological characteristics is needed 2 .
In sheep as in other dairy animals, the anatomical characteristics of udder are important for milk production and milkability 3 . The mammary gland morphology is related with milk production and aptitude for mechanical milking, which has already been used in genetic programs for dairy sheep 4,5 . For instance, Rovai et al. 6 suggested that sheep with greater milk yields showed large cisternal udders that were able to secrete high milk volumes. In the same line, McKusick et al. 7 reported that increasing one centimeter of udder circumference and udder height results in an increased from 0.06 to 0.11 L of milk in East Friesian ewes. This relationship allows to estimate milk yields from the udder measures.
Several approaches have been used to estimate milk production based on morphological measures of the udder and those have yielded to large differences in their predictive accuracy. The ability to estimate dairy performance depends on the number and type of udder measures, stage of lactation and analytical tools used to assess the udder-milk yield relationship 8 . Analyzing the relationships between udder morphology and milk yield from dairy ewes started back in the 1950s, and until now, all studies focused on this topic have analyzed this relationship using classical estimation methods 9,10 , without exploring new computational analyzing tools www.nature.com/scientificreports/ that increase the ability to explain milk yield performance based on udder measures. In this regard, most of the studies use multiple regressions as an approach to estimate milk production using udder measures as main predictor. However, these traditional approaches have several limitations such as sensibility for missing data, need of normal distribution data, and overparameterization when many predictors are used, which limits their use and fitting performance. The used artificial neural networks (ANN) are capable to deal with complex non-linear relationships between input and output variables, with no need for rigid a priori models 11 . The AAN is a nonlinear information processing system inspired by the biological nervous system. The ANN is based on interconnected elementary processing devices called neurons. The networking begins from the input information through one (or) more hidden layers, to the input layers. The ANN application has several advantages as good self-learning ability, nonlinearity, massive parallelism, easy implementation, adaptability, real-time operation, and fault tolerance ability 12,13 . The ANN has been used in animal science for classification and prediction purposes related to nutritional management 13,14 , dairy performance 15,16 , animal health 17 , genetic improvement 11,18 and processing of dairy products 19 . However, in ruminants there is no evidence of their used to estimate the relationship between milk yield and mammary gland measures in Pelibuey ewes.
The ANN are flexible models in terms of model assumptions and structure with accurate and precise predictions 13 . However, its use is challenged by defining optimal neuron network architecture, which implies that the determination of hidden units required to mimic the analyzed system using input-output examples is needed. The approach to this issue requires considering the relation between the model complexity and training performance. For example, using Akaikes's Information Criterion (AIC) and other criteria of goodness of fit allow choosing adequate ANN architecture 20 . Therefore, the aim of this study is to assess the different ANN architectures applied to udder measures from Pelibuey ewes to estimate milk yield and comparing them with predictions from linear regressions.

Results
Scatterplots, distributions and correlation coefficients of input and output variables are shown in Fig. 1. Must variables had a normal distribution, but the VDF and MY showed right skewness. The scatterplots showed significant linear relationship between the inputs CI (r = 0.66; p < 0.001) and VDF (r = 0.741; p < 0.001) with the output MY. Also, a significant correlation between CI and IW (r = 0.615; p < 0.001) and VDF (r = 0.661; p < 0.001) was found, which could determine the presence of multicollinearity in the MLR implementation. Table 1 shows the goodness of fit of MRL and the seven hidden configurations of the ANN. The proportion of the variance of MY explained by MRL was of 62.0%, with an accuracy of 90.6% and a value of RMSPE of 0.11 kg. The ANN with an architecture of two layers with five neurons (5-5) each one showed similar values of goodness of fit criteria and MLR accuracy. However, with the increase of neurons in both layers improved the goodness of fit. This finding indicated that modifying the number of hidden nodes influenced the model performance and its estimations. Figure 2 shows the best ANN architecture, which was structured by two-layers and 15 neurons each one .

Discussion
Our study assessed the usefulness of udder measures to estimate milk production on Pelibuey sheep using machine-learning techniques. The accurate prediction of milk yield is a fundamental step for implementing strategies to improve dairy performance in small ruminants. Using milk yield records, the ANN have been applied to estimate milk yield in dairy cows 15,21,22 , sheep 23 , goat 24,25 and buffalo 26 . The udder measures have been widely used to predict milk yield 2 ; however, these predictions have been carried out using traditional statistical tools such as multiple linear regression. The current study is the first reporting the use of ANN to estimate milk yield based on udder measures from sheep.
Several studies have compared the performance of traditional predictive tools (i.e., multiple linear regression and random forest) versus ANN to estimate milk yield 15,21 , predict mastitis 27 , forecasting cow locomotion score 28 , prediction of body weight 29,30 , modelling rumen fill 31 and prediction of carcass tissue composition 32 . Compared with MLR, results from the current study revealed a better performance of ANN to estimate milk yield based on udder measures. These findings agree with Bhosale and Singh 21 who reported that the performance of ANN (r 2 = 83.5) was better than the MLR (r 2 = 76.21) model for milk yield prediction. In addition, other authors have reported a better performance of ANN versus MLR when estimating milk yields from ruminants 15,33,34 .
In the current study, an increase of goodness of fit and accuracy was observed as number of nodes in the two hidden layers were increased. Which is explained by the fact that the number of nodes in the hidden layers defines the complexity and power of the neural network model to delineate underlaying relationships and structures inherent to the database. The number of nodes in the hidden layers should be large enough for the correct representation of the studied phenomena, but at same time low enough to have adequate generalizations. Several methods have been proposed to define the optimum number of hidden layer nodes. For instance, the approach proposed by Garson 35 35 approach determines an optimal number of layers to our database in a range of 6 to 12 nodes, which is markedly lower than the best architecture tested in the current study (n = 30).
Recently, Madhiarasan and Deppa 12 tested 151 different criteria of selection of the number of hidden neurons in back propagation networks using the RMSE, Mean Absolute Error (MAE), Mean Relative Error (MRE) and Mean Square Error (MSE) as error criteria to choose the optimal neuron network architecture. These authors  For the current database, the computed number of hidden neurons using the Madhiarasan and Deppa 12 approach was 36, which is similar to the best architecture tested in the current study. Therefore, we considered that an ANN architecture with 2 layers and 15 neurons in each one is the most appropriate for our database structure as it provides the best estimations without affecting its generalization capacity. Our best architecture had the highest number of hidden neurons, which is in accordance with Ince and Sofu 23 , who referred that a bigger network size would contribute to lower error levels in the training set. However, several authors referred that a more complex ANN architecture not necessarily represents an optimal solution for a given problem 15 . For this reason, the definition of the most appropriate architecture is the main step of the ANN implementation, which must considerer the context of research, database structure, type of variables, learning methods and objective of implementation (prediction or classification) 15,23 .
Overall, five udder measures were used to estimate the milk yield from Pelibuey sheep implementing two analytic approaches: multiple linear regression (MLR) and artificial neural networks (ANN). Both approaches were able to predict milk yield. However, the ANN showed better goodness compared with MLR. The accuracy of ANN to estimate milk yield depended on its neuron network architecture with increasing accuracy as the complexity of the ANN was increased. Therefore, the result of the current study reveals that ANN is a powerful tool to estimate milk yield when udder measures are used as an input variable and our data could be used or extrapolated for studies related to dairy ruminants. Further studies could use larger datasets and use them to perform different experiments including hyperparameter optimization by applying different approaches such as grid search, meta-heuristics, evolutionary methods and those based on nature-inspired algorithms.

Methods
Animal care, welfare and procedures were carried out according to the guidelines of the Animal Care Committee of the División Académica de Ciencias Agropecuarias, Universidad Juárez Autónoma de Tabasco under the approved ID project PFI: UJAT-DACA-2015-IA-02.

Database.
A total of 357 milk yield records with its corresponding gland udder measures was used in the present study. The records of a total of 38 multiparous Pelibuey ewes were analyzed in the current study. Ewes had an average body weight of 36.3 ± 4.9 kg with single (n = 33) and twin lambing (n = 5). Details about the nutritional, health and lamb management are specified by Arcos-Álvares et al. 2 . Ewes were milking manually once a day and udder measures were recorded twice a week. Udder measures were taken before and after milking and those were udder circumference (circumference at the udder medium area), udder width (distance between the widest lateral points of the udder), udder height (distance from the udder insertion to the lower extremity) and udder volume were calculated as follows: www.nature.com/scientificreports/ where V is the udder volume (cm 3 ), R is radius (cm), CP is circumference perimeter (cm), π is 3.14159265358979 and Ud is the udder depth (cm).
Model training. Arcos-Álvares et al. 2 using the same database of the current study to estimate milk yield through fit several multiple linear regression (MLR) models. To estimate daily milk yield (MY), these authors used a stepwise process to choose the best model using the following goodness of fit criteria: mean square prediction error (MSPE), root of MSPE (RMSPE) and coefficient of determination (r 2 ). For that, the inputs were initial udder circumference (CI), final udder heigh (FH), initial udder width (IW), final udder width (FW), and the difference between the initial and final udder volumes (VDF) as input variables. Arcos-Álvares et al. 2 used five models with the best goodness of fit following this equation: where MY is the daily milk yield and the input variables (CI, FH, IW, FW and VDF) as were described above.
The asterisks mean the level of significance of the regression coefficients: *p < 0.05; ***p < 0.000. In order to contrast MLR versus ANN the same inputs were used to carried out machine learning essay. Normalization is very important to build ANN since the database has different range and different units. The proposed approaches use the min-max normalization technique using the formula proposed by Zeng and Yeung 36 : where X i is the real input data, X ′ min is the least input data, X ′ max is the greatest input data, X ′ min is the least target value, X ′ max is the greatest target values. A supervised learning was used to train and teach the network using a two-layer ANN with seven hidden structures: 5-5, 5-10, 10-5, 10-10, 10-15, 15-10 and 15-15 neurons. Inputs vales are transferers to the hidden layer that multiplies weight W using tangent sigmoid activation function; the outputs of hidden layers get transferred to the output layers that multiplies with weight V using tangent sigmoid activation function 12 . A random 70% of the data was used to training and 30% for testing, the process was repeated 100 times. The globally convergent algorithm based on the resilient backpropagation was used to calculate ANN with a maximum step for training ANN of 1 × 10 7 using the neuralnet package in R software 37 . Performance assessment. The fit of ANN was compared with the best multiple regression model (MRM) fitted previously by Arcos-Alvarez et al. 2 using same input variables in both models. Goodness of fit was evaluated using: Mean square of prediction error (MSPE), calculated as: where e t is the residual for the milk yield of t sheep, n is the number of observations in the training dataset and p is the number of parameters of models. Root-mean-square prediction error (RMSPE): Akaike's Information Criterion (AIC) calculated as follows 38 : where n is the number of examples in the training dataset, LL is the log-likelihood for the model using the natural logarithm (e.g., the log of the MSPE), and k is the number of parameters in the model. Bayesian's Information Criterion (BIC) calculated as follows 39 : where n is the number of observations in the training dataset, LL is the log-likelihood for the model using the natural logarithm (e.g. log of the mean squared error), and k is the number of parameters in the model, and log() is the natural logarithm. Accuracy of estimations was calculated as follow: (1) R = CP 2π (2) V = πR 2 Ud (3) MY = −0.34 ± 0.66 * * * + 0.007 ± 0.001 * * * × CI + 0.009 ±0.002 * × FH + 0.02 ± 0.005 * * * × IW − 0.009 ± 0.004 * × FW + 0.0002 ± 0.00001 * * * × VDF