Post translational modifications of milk proteins in geographically diverse goat breeds

Goat milk is a source of nutrition in difficult areas and has lesser allerginicity than cow milk. It is leading in the area for nutraceutical formulation and drug development using goat mammary gland as a bioreactor. Post translational modifications of a protein regulate protein function, biological activity, stabilization and interactions. The protein variants of goat milk from 10 breeds were studied for the post translational modifications by combining highly sensitive 2DE and Q-Exactive LC-MS/MS. Here we observed high levels of post translational modifications in 201 peptides of 120 goat milk proteins. The phosphosites observed for CSN2, CSN1S1, CSN1S2, CSN3 were 11P, 13P, 17P and 6P, respectively in 105 casein phosphopeptides. Whey proteins BLG and LALBA showed 19 and 4 phosphosites respectively. Post translational modification was observed in 45 low abundant non-casein milk proteins mainly associated with signal transduction, immune system, developmental biology and metabolism pathways. Pasp is reported for the first time in 47 sites. The rare conserved peptide sequence of (SSSEE) was observed in αS1 and αS2 casein. The functional roles of identified phosphopeptides included anti-microbial, DPP-IV inhibitory, anti-inflammatory and ACE inhibitory. This is first report from tropics, investigating post translational modifications in casein and non-casein goat milk proteins and studies their interactions.

Post translational modifications in goat milk proteins. The annotations of peptides observing modifications including cysteine carbamidomethylation, oxidation, acetylation and phosphorylation in each protein are presented in Table S2. Post translational modifications were observed in a total of 201 peptide sequences corresponding to 120 Uniprot ID across the reference databases. PTMs were observed in high and low abundant milk proteins in each breed. High performance Q-Exactive LC-MS/MS analyses identified 287 sites of phosphorylation on 120 unique phosphoproteins. The Ser/Thr/Tyr/Asp ratio was 128:70:42:47, respectively. The phosphorylation at Asp residues is reported for the first time in goat milk proteins.
Isoform in different breeds. The identified 123 common UniprotKB ID proteins with varying level of phosphorylation across the 10 breeds were observed by comparing databases. The phosphosites and phosphoproteins observed in each breed are presented in Table S3 Similarly, the low abundant proteins were identified in different breeds with varying sites of phosphorylation. Proteins Proteoglycan 4 (PRG4) showed 8P isoform in Barbari, PIK3C2A, CEP152 observed 7P each and SLC1A6 1P in Barbari. Interferon tau BB4 (7P), CREBBP (5P), Cationic trypsin (8P) isoforms were observed in Jamunapari. PDZD9 and SMC5 observed exhibited 5P and 7P in Osmanabadi goats. Emilin-2 (3P), FAM222B (4P), MnSOD and OVOL1 each with 2P isoforms were observed in Sirohi goats.  Cationic trypsin S125; S127; S134; T130; S173; S175; T180; S181; S215 PRSS1 The interaction of proteins was analysed by STRING by MCL clustering of the identified proteome categorised the proteins in 16 clusters at highest confidence (>90%) score as depicted in Figure 1a. The network comprised of three major groups where keratin proteins shared distinct close interconnected cluster. Proteins found in defense response were also involved in response to stress, signal transduction and tissue development.
Functional analysis of PTM proteins. The GO functional annotations for PTM genes subset were enriched using DAVID 6.8 for classification into BP (87.5% genes), CC (94.4% genes) and MF (87.5% genes). The details of annotations are presented in Table S5. Enriched GO terms included developmental process (P value 0.02069), response to stress (P value 0.0099), response to drug (P value 0.0040), biogenesis (P value 0.048), structural  www.nature.com/scientificreports/  www.nature.com/scientificreports/ molecular activity (P value 5.19 E−04), protein dimerization activity (P value 0.0092) and extracellular space (P value 3.00 E−06). The phosphoprotein subset when analysed in Cytoscape Cluego identified 76 genes to generate a network with 6 functional groups at different levels. The details of annotations are presented in Table S6, Supplementary data file 1. The categorized functional groups were lactation, response to hydrogen peroxide, positive regulation of ossification, defense response to gram-positive bacterium, cornification and Systemic lupus erythematosus. Keratin proteins were associated with cornification, H2BC12, H2BC21 and KRT6A with antimicrobial humoral immune response mediated by antimicrobial peptide. Proteins BMPR2, NPPC, ZBTB16 found in defense response to gram-positive bacteria, CAD, CSN2, CSN3 in lactation and neurofilament in structural constituent of cytoskeleton. Estrogen signaling pathway was detected and proteins H3C1, H3C13, H3C15 were associated with pathways such as alcoholism, histone modifications, systemic lupus erythematosus (Fig. 1b).
The PTM gene subset exhibited a network comprising of 8 clusters in STRING (Fig. 1c) The keratin proteins clustered with PRSS1 (Trypsin); CSN2, CSN1S1, CSN3 and LALBA were identified in single cluster and histone proteins interacted closely with CREB-binding protein.
The identifiers (PTM genes) were analysed in Reactome database, where 61 identifiers were found in 332 pathways hit by at least one of them. The majority of proteins were associated with signal transduction, immune system, developmental biology and metabolism pathways. Pathways such as signaling by nuclear receptors, WNT, Notch and RHO GTPase were significant with p < 0.05 (Fig. 1d). The identified 17 phosphoproteins involved in signal transduction pathways included BMPR2, CCR2, CFL1, CREBBP, CSN2, H3FA, HIST1H2BK, HIST2H2BE,  HIST2H3A, HIST2H3D, HIST3H2BB, LAMC2, OR2K2, OR51D1 and PRG4. The details of pathways has been given in Table S7, Supplementary data file 1.

Discussion
Non-bovine milk is attracting the researcher's attention due to its nutrition and therapeutic applications. Goat milk, is leading in the area for nutraceutical formulation and drug development using goat mammary gland as a bioreactor. Goat milk has unique chemical, biochemical, physical and nutritional characteristics and has higher digestibility and lower allergenicity over cow milk 25,26 . Post-translational modifications (PTMs) of milk proteins contributed to their biological functions and their compositional complexity 27 . It has been commonly observed that phosphorylation of casein occurs at S or T amino acid residues in tripeptide sequences S/T-X-A, where X represents any AA residue and A is an acidic residue 5 . In the present study, we have identified phosphorylation occurring in S, T, Y, D amino acids and other post translational modifications as carboxymethylation, oxidation and acetylation. The phosphosites on Asp residues have been reported for the first time in goat milk. Post translational modification of proteins plays an important role to regulate the cellular processes, protein function, protein localization, and formation of protein complex. In eukaryotic cells, protein phosphorylation is among the most frequent post translational modifications 28 . Post-translational modifications in proteins are crucial for the activity state, localization and protein-protein interactions 29 . Therefore, molecular diversity of goat milk proteins needs to be explored to identify post translational modifications for studying potential biological role and protein-protein interaction. The present study focused on obtaining a comprehensive profile of the PTM sites of goat milk casein and non-casein protein and their interaction. Proteomics as a tool have been employed for the discovery, and characterization of post translational modifications such as phosphorylation, oxidation [30][31][32] . Electrospray ionization (ESI) mass spectrometry (MS) is suitable for studying PTM, including phosphorylation and glycosylation, since the technique provides molecular mass determination of native proteins. In this study, we analysed the goat milk proteins to identify both casein and non-casein PTM sites using nLC-MS/MS. We processed distinct protein spots by mass spectrometry for identification of phosphorylation, oxidation, acetylation and caramidomethylation. A peptidome of 201 peptide sequences with post translational modifications identified 86 proteins/120 UniprotKB accessions (Table S2). The phosphorylation site identified on the amino acids serine, threonine, tyrosine and aspartic acid was 128, 70, 42 and 47, respectively. Serine showed highest affinity for phosphate group in the present study and confirming earlier reports 18 .
Phosphorylation has been well characterized in the bovine caseins 33,34 . There are various reports targeting casein fractions in bovine and non-bovine milk by various proteomic approaches 32,[35][36][37][38][39] . The majority of bovine caseins exist in a phosphorylated form, and the phosphorylated residues vary from individual variants possessing one phosphorylated residue (κ-CN) to 13P for others (αS2 CN) 40 . Bijl and others 41 demonstrated that high αS1-CN-8P concentration in bovine milk is a great benefit for the production of uncooked curd cheese because αS1-CN-8P is hydrolyzed more efficiently by chymosin during ripening. Bovine proteome analysis showed more than 30 phosphorylated proteins which included 5 CSN2, 15 CSN1S1, 10 CSN1S2 and 4 CSN3 casein components 42 . Similarly, donkey milk showed 11 CSN3, 6 CSN1S1 and 3 CSN1S2 casein components 43 .
Goat milk proteins have been analysed using MS/MS [44][45][46][47][48][49] . The present study resulted in 105 phosphopeptides from casein and non-casein proteins from all the analysed samples. The conserved peptide sequence of (SSSEE) in casein was also observed in CSN1S1 and CSN1S2 at 1P, 2P and 3P. The phosphosites were identified in casein and whey proteins and on 45 other low abundance proteins. The total number of phosphosites observed in the major milk proteins associated with S, T, Y and D residues were 32, 18, 11 and 21, respectively. The phosphorylation sites observed for CSN2, CSN1S1, CSN1S2, and CSN3 were 11P, 13P, 17P and 6P, respectively (Table 1). However, whey proteins BLG showed 19 phosphosites (7D, 3Y, 5T, 4S) and LALBA showed 4 phosphosites (3D, 1Y). The identified phosphopeptides resulted in 12P, 8P, 11P, 5P, 13P and 4P isoforms of CSN1S1, CSN1S2, CSN2, CSN3, BLG and LALBA, respectively. Beta casein showed highest degree of variation in phosphorylation with identification of 17P sites and other PTM such as oxidation in the identified peptides. A higher number of casein phosphopeptides and phosphorylation sites are reported in the present study in comparison to previous study 18 .
Casein and whey proteins are post translationally modified by proteolysis by the milk enzymes, formation of disulphide bond by oxidation of cysteine, differential phosphorylation levels of serine and threonine, and glycosylation of threonine residues 50 . The phosphorylation degree of αS-casein is a prime factor affecting the technological properties of milk. Therefore, "signature peptides" and "caseome" analysis are being used to investigate adulteration in milk of different species 51 . The identification of cheese from different species has been authenticated by proteolytic peptides 52,53 . Therefore, proteome analysis of fermented milk products should be carried out due to their nutritional and health economic importance.
The present study reported 45 non casein phospho proteins assigned to various metabolic pathways ( Table 3). The identified PTM sites varied in milk samples of different goat breeds ( Table 4). Identification of low abundance proteins in milk is difficult as single step analysis fails to detect a large proportion of these proteins. Moreover, to overcome the limited entries in the caprine database, other reference database were used for identification of low abundance proteins. The varying levels and sites of phosphorylation in different breeds may be attributable to various physiological or environmental conditions under the influence of different agro-climatic regions. The phosphopeptides assigned to these proteins were mainly mono-or bi-phosphorylated ( Table 2).
The other post translational modifications such as acetylation, oxidation, and carbamidomethylation have also been reported ( www.nature.com/scientificreports/ functioning and their interaction. N-terminal acetylation increases peptide stability by preventing N-terminal degradation 54 . Peptides with carbamidomethylation are mainly used in peptide mass fingerprinting for identification and characterization of proteins 55 . In the present study, the protein-protein interaction was analysed to know about the functional properties of proteins (Fig. 1). The GO annotations were analysed using DAVID, network and interactions using Cytoscape and STRING and pathways analysis using Reactome database. The keratin proteins interacted with trypsin (PRSS1) and histone proteins interacted with CREB binding protein (CREBBP). The identified phosphoproteins were associated with lactation, response to stress, histone modification, cornification and signaling pathways (Tables S5,  S6, S7). It has been reported that caseins are associated with other secreted calcium (phosphate)-binding phosphoproteins, such as osteopontin, in milk 56 . Protein phosphorylation is vital for the regulation of metabolism, proliferation, inflammation, apoptosis, signaling and other important physiological processes. Autophosphorylation increases the catalytic efficiency of the receptor and provides binding sites for the assembly of downstream signaling complexes 57 . Caseins form micelles which vary from species to species, and when cleaved, generated bioactive peptides, having potential functions making them protein of interest. This gastrointestinal degradation may be the consequence of enzymatic hydrolysis, fermentation and other processes used in dairy production 58 .
The identification and characterization of phosphorylation sites are required to explore signaling networks of milk proteins. Phosphorylation-site provides definitive information on functional relationships between signaling proteins. The peptides released by enzymatic hydrolysis have specific biological functions due to their functional and interactions at cellular level 59 . The identified phospho bioactive peptides were mainly anti-microbial followed by ACE inhibitory, DPP-IV inhibitory, proliferating functions. Anti-oxidative, antioxidant, anxiolytic and hypocholesterolemic peptides were also confirmed from goat milk proteins (Fig. 1e). Non-bovine milk, for their health potential, economic value and the bioactive components/peptides, the milk protein fractions are being extensively investigated.
The goat milk protein genotypes have been observed in different breeds. Protein and casein content depend on allelic variants and breeds in different regions [60][61][62][63] . CSN1S1 gene acts as natural model and different genotypes occur due to interallelic combinations 64 . Indian goats have higher frequency of A and B alleles 65,66 . Sannen, Alpine and other European goat breeds have higher frequency of medium alleles like C, D, E, F 60,61,67,68 . Therefore, interallelic combination in casein complex leads to differential protein synthesis as well as other processing properties. Therefore, including different breeds/genotypes will definitely affect the proteome identification and post translational modifications pattern.

Conclusion
Proteomic analysis has been carried out to study human, bovine and non-bovine milk for nutrition and therapeutic applications. Phosphorylation has been well characterized due to several technical, practical and bioinformatics approach. The present study identified 201 peptides showing post translational modifications in goat milk. The phosphorylation at Asp residues is reported first time in goat milk proteins. The rare conserved peptide sequence of (SSSEE) in casein was observed in casein phosphopeptides (CSN1S1, CSN1S2). Phosphorylation regulates protein functions, such as biological activity, interaction and stabilization by causing conformational changes in the protein. Therefore, the identification of the post-translational modifications of the milk proteome has become necessary and may provide newer insight to extend the milk proteome and its potential biological role.

Materials and methods
The work flow of the present study has been depicted in Fig. 2.
Milk collection and description of genetic stock. Goat milk samples were collected from 1240 animals belonging to 10 goat breeds/ genotypes during the postpartum days 30-65. Milk samples were collected from natural habitats of the breeds belonging to different geographical and agro-climatic regions of India. Samples were collected between 7:30 and 9:00 hrs. by hand milking. After disinfecting the udder, 30-40 ml of milk was collected directly to the collection tubes, transported to the laboratory at 4ºC within 24 hrs. and stored at -20ºC. Milk subsamples for protein analysis were stored at -40ºC until further analysis. All sample collection was conducted in accordance with institutional practice and the study was approved by Institutional animal ethics committee (IAEC).
The breeds analysed in the study belonged to arid, semi-arid, humid, coastal and mountain regions with different grazing conditions (Fig. 3). The details of samples collected with the description of natural habitats from each breed are presented in Table S1, Supplementary data file 1. The animals were apparently healthy and the body condition score was satisfactory (3)(4). The animals are mainly reared under semi-intensive system depending mostly on field grazing and supplementation of dry fodder, concentrate and mineral mixture.
Gel based milk protein analysis. The milk protein variants were analysed by SDS-PAGE and details have been described elsewhere 44 . The skimmed milk samples were centrifuged at 12000g, 4ºC, 15 min to obtain clear transparent aqueous layer. This layer was separated, quantified and reduced in Laemmli's sample buffer for analysis by SDS-PAGE. Milk proteins were resolved on 14% SDS and urea PAGE using 8 µg in each lane. The gels were stained with commasie brilliant blue (R250) and subsequently scanned in the Gel documentation system (Alpha Innotech Corporation, USA). Milk protein variants were identified by comparing the allelic variation with reference samples (confirmed by sequencing) by determining the molecular weight.
The protein variants (n=21) identified in the 10 breeds using SDS-PAGE were selected for analysis by 2DE. Individual protein samples were subjected to in-gel rehydration of IPG strips (SERVA IPG Bluestrip 3-10 NL/7cm). Rehydration was carried out at room temperature for 16 hrs. with protein sample diluted in rehydration www.nature.com/scientificreports/   Nano-liquid chromatography mass spectrometry (nLC-MS/MS). The details of nLC-MS/MS analysis has been described elsewhere 44 and used with a minor modifications for PTM analysis.
a. Protein digestion The silver stained protein spots in gels were excised carefully with help of a sterilized spatula and transferred into separate vials. Destaining and acetone precipitation was carried out. In-gel digestion was performed; samples were reduced with 5 mM TCEP (Tris 2-Carboxyethyl Phosphine) at 55°C for 1 hour and further alkylated with 50 mM iodoacetamide for 30 min at room temperature in the dark. The gel spots were shrinked with acetonitrile and air-dried for few minutes at room temperature followed by digestion with trypsin (1:50, trypsin/lysate ratio) for 16 hours at 37°C. Digests were dried using speed vac for 1 hour and pellet was dissolved in buffer A (5% acetonitrile, 0.1% formic acid). Digests were cleaned by Sep-Pak. Titanium method was performed for phospho binding. Digests were cleaned up again using C18 silica cartridge (The Nest Group, Southborough, MA) following manufacturer's protocol and dried using speed vac. The desalted dried pellet was reconstituted in buffer A (5% acetonitrile, 0.1% formic acid).

b. Liquid chromatography mass spectrometry analysis
All the experiments were performed using EASY-nLC 1000 system (Thermo Fisher Scientific) coupled to Q Exactive mass spectrometer (Thermo Fisher Scientific, Germany) equipped with nano electrospray ion source. Peptide mixture (1.0 µg) was loaded on a precolumn and was resolved using 5 cm PicoFrit column (360 µm outer diameter, 75 µm inner diameter, 10 µm tip) filled with 1.9 µm of C18-resin (Dr Maeisch, Germany). The peptides were loaded with buffer A and eluted with a 0-40% gradient of buffer B (95% acetonitrile, 0.1% formic acid) at a flow rate of 500 nl/min for 10 min. The QExactive was operated using the Top10 HCD data-dependent acquisition mode with a full scan resolution of 70,000 at m/z 400. MS/MS scans were acquired at a resolution of 17500 at m/z 400. Lock mass option was enabled for polydimethylcyclosiloxane (PCM) ions (m/z = 445.120025) for internal recalibration during the run. MS/MS data was acquired using a data-dependent top 10 method dynamically choosing the most abundant precursor ions from the survey scan.

c. Protein identification and PTM analysis
The .raw files generated were analyzed using Proteome Discoverer (v2.2) against the in-house Uniprot reference proteome database (Capra hircus, Ovis aries, Homo sapiens and Bos taurus). Due to scarcity of available protein annotations for goat, the analysis was carried out with goat, human, sheep and cow database with 7056, 20117, 27666, and 23869 entries respectively. For Sequest search, the precursor and fragment mass tolerances were set at 10-15 ppm and 0.5 Da, respectively. The protease used to generate peptides, that is the enzyme specificity was set for trypsin/P (cleavage at the C terminus of "K/R": unless followed by "P") along with maximum missed cleavages value of 2. Carbamidomethyl on cysteine as fixed modification and oxidation of methionine and N-terminal acetylation and phosphorylation (S, T, Y, D) were considered as variable modifications for database search. Peptide spectrum match and protein false discovery rate (FDR) were set to 0.01. Functional analysis. All the identified milk proteins were assigned their gene symbol via the Uniprot knowledgebase (http://www.unipr ot.org/). Protein classification of the identified proteome and PTM genes subset were performed based on their functional annotations using Gene Ontology (GO) for biological process, subcellular localization and molecular function using the Database for annotation, visualization and integrated discovery (DAVID) version 6.8. The enrichment was performed with Homo sapiens database in background for the identified gene names and measured by fisher exact test in DAVID system.
The analysis for the PTM subset was performed using Cytoscape v.3.8.1. Protein interaction networks, biological pathways and protein clusters with Homo sapiens as reference database were generated using cluego+cluepedia plugin. Analyses were carried out with a significance level of 0.05 using a hypergeometric test and the Benjamini & Hochburg false discovery rate correction.
Protein interaction networks were analysed using search tool for the retrieval of interacting genes/proteins (STRING version 11.0). STRING networks were calculated at highest confidence score of 0.900 for the entire set of milk proteins and for PTM protein subsets and interactions were clustered using MCL algorithm. The gene search for network was performed in Homo sapiens reference database.
The peptides showing modifications were grouped based on the length as small (<7AA), medium (7-25 AA) and long (>25AA). The functions of these peptides were determined by Milk bioactive peptide database (MBPDB) 69 at threshold of 80% identity match for their known functions. www.nature.com/scientificreports/ Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/.