A theoretical estimate for nucleotide sugar demand towards Chinese Hamster Ovary cellular glycosylation

Glycosylation greatly influences the safety and efficacy of many of the highest-selling recombinant therapeutic proteins (rTPs). In order to define optimal cell culture feeding strategies that control rTP glycosylation, it is necessary to know how nucleotide sugars (NSs) are consumed towards host cell and rTP glycosylation. Here, we present a theoretical framework that integrates the reported glycoproteome of CHO cells, the number of N-linked and O-GalNAc glycosylation sites on individual host cell proteins (HCPs), and the carbohydrate content of CHO glycosphingolipids to estimate the demand of NSs towards CHO cell glycosylation. We have identified the most abundant N-linked and O-GalNAc CHO glycoproteins, obtained the weighted frequency of N-linked and O-GalNAc glycosites across the CHO cell proteome, and have derived stoichiometric coefficients for NS consumption towards CHO cell glycosylation. By combining the obtained stoichiometric coefficients with previously reported data for specific growth and productivity of CHO cells, we observe that the demand of NSs towards glycosylation is significant and, thus, is required to better understand the burden of glycosylation on cellular metabolism. The estimated demand of NSs towards CHO cell glycosylation can be used to rationally design feeding strategies that ensure optimal and consistent rTP glycosylation.


Materials and Methods
In order to obtain an improved estimate for the demand of NSs towards CHO cell glycosylation, we required an estimate of (i) the relative abundance of CHO HCPs, (ii) the weighted frequency of N-linked and O-GalNAc glycan sites per protein, (iii) the average glycoform distribution of each type, and (iv) NS demand for glycolipid synthesis. The workflow employed to obtain each of these components and the stoichiometric coefficients of NS consumption towards CHO HCP glycosylation are detailed in Fig. 1.
Step 1: CHO HCP relative quantification. The reported proteome of CHO cells 19 was compared with the SwissProt database 27 using BLAST 28 to obtain the closest reviewed homolog for each HCP. The sequence, length (Len i ), and molecular weight (MW i ) for each homolog were also retrieved from SwissProt. The obtained amino acid lengths and reported spectral counts 19 (SC i ) were then used to calculate the relative abundance (Z i ) of each CHO HCP using Eqs 1 and 2, as described previously 29,30 . With the obtained values for Z i , the weighted average amino acid length (Len HCP ) and molecular weight (MW HCP ) of CHO HCPs were calculated using Eqs 3 and 4. The list for all reported HCPs, the obtained homologs, and the calculations for Z i , Len HCP , and MW HCP are presented in Supplementary Table ST1. The weighted relative abundance of the amino acids present in CHO HCPs x ( ) Aac j was obtained with Eq. 5, using the amino acid frequency in the sequence of each homologous protein (n Aac j ) and Z i . The calculation of x Aac j across the whole CHO proteome is presented in Supplementary Table ST2. Step 1: the CHO proteome 19 was BLASTed 28 against the SwissProt database 27 to obtain the corresponding confirmed homologous proteins. The length, amino acid sequence, molecular weight, and potential N-linked glycosylation sites for each homologous protein were extracted from the database. Using the protein length and the spectral counts available from literature 19 , the relative abundance of each homologous protein (Z i ) was obtained 29,30 (Eq. 2). With Z i , the weighted average protein length Len ( ) HCP , molecular weight MW ( ) HCP , and amino acid content were calculated (Eqs 3-5).
Step 2: the amino acid sequences of potentially N-glycosylated homologous proteins were analysed with the NetNGlyc 1.0 server 32 35 , and a mAb molecular weight (MW mAb ) of 148,545 g/mol. All data and calculations are presented in Supplementary Tables ST1-ST10. Scientific RepoRts | 6:28547 | DOI: 10.1038/srep28547 Step 2: N-linked glycan sites per protein. The vast majority (> 90%) of all reported N-linked glycosites in SwissProt have not been experimentally confirmed to be occupied 31 . In order to further refine the estimate by considering macroheterogeneity (variations in N-glycan occupancy of each potential site), the amino acid sequences for all potential N-linked glycoproteins were analysed using the NetNGlyc 1.0 server (http://www.cbs. dtu.dk/services/NetNGlyc/) 32 . To obtain a conservative estimate for N-linked glycosites, only ones found to be within the upper thresholds (+ + and + + + ) were considered to be occupied. The obtained number of occupied N-linked glycosites (S Nglyc i HCP , ) was multiplied by the respective protein's relative abundance (Z i ) and then summed to obtain the weighted number of N-linked glycosites across the entire CHO proteome (S Nglyc HCP ), as shown in Eq. 6. All calculations performed to estimate S Nglyc HCP , along with the contribution of each predicted N-glycoprotein to the weighted number of N-glycosites, are presented in Supplementary Table ST3.  27 . However, a recent study by Yang and collaborators 21 found 1,548 O-GalNAc glycosites in the proteome of CHO cells alone, suggesting that O-GalNAc glycosylation may be underreported in SwissProt. The low frequency of reported O-GalNAc sites is possibly due to the challenging analytical methods required for mapping, identifying, and quantifying these glycan types which lack a consensus amino acid sequon, may have core structure variability, and cluster densely in Ser/Pro/Thr-rich domains 33 .
In order to obtain the frequency of these post-translational modifications, the reported O-GalNAc glycoproteome of CHO cells was used 21   , and x Oglyc m HCP , , respectively) presented in Fig. 2 was multiplied by the frequency with which the monosaccharide species appears in each glycan (n N S l m , / k ), as shown in Eqs 8 and 9. Each monosaccharide present on the reported HCP glycans was assumed to be sourced from the respective nucleotide sugar (NS). All N-linked HCP glycans were assumed to contain three glucose (sourced from uridine diphosphate glucose -UDP-Glc) and nine mannose (sourced from guanosine diphosphate mannose -GDP-Man) residues, given that all N-linked glycans are synthesised from the Glc 3 Man 9 GlcNAc 2 precursor oligosaccharide depicted on the top left corner of Fig. 2. These calculations are detailed in Supplementary Tables ST5 and ST6.
The weighted average monosaccharide composition of mAb glycans ( f N S Nglyc mAb , k ) was obtained by combining the relative abundance of mAb constant fragment (Fc) N-glycans (x Nglyc q mAb , ) reported for trastuzumab 23 (Fig. 2) with the frequency of each monosaccharide residue in these Fc carbohydrates (n N S q , k ), as shown in Eq. 10. The Fc glycoform distribution of trastuzumab 23 was used because product glycan data is not reported in the work cited herein for the estimation of NS consumption between host cell and product glycosylation [24][25][26] . Furthermore, data for trastuzumab glycosylation was selected for the calculation of f N S Nglyc mAb , k given that this product contains glycans that are representative of almost all commercially-available mAbs (complex biantennary with varying degrees of fucosylation and galactosylation 1,2,4 ) and can be considered a 'model' CHO-derived product due to its isotype (IgG1 -the most common among therapeutic mAbs 4 ) and the fact that it has been manufactured using CHO cells for nearly two decades 3 . Similarly to HCP glycans, all mAb Fc N-linked glycans were assumed to require three glucose and nine mannose residues for their formation. All calculations performed to obtain f N S Nglyc mAb , k are presented in Supplementary Table ST7. The monosaccharide content of total CHO cell lipids relative to total protein content (F N S GL k ) was obtained from the work by Briles and collaborators 22 . With these values, the distribution of CHO cell glycosphingolipids (Fig. 2) was calculated as shown in Supplementary Table ST8. , as shown in Eq. 11. The stoichiometric coefficient for NS demand towards CHO HCP O-GalNAc glycosylation was calculated in a similar fashion, as presented in Eq. 12. For these calculations, the dry weight of CHO cells reported by Carinhas et al. 34 (M cell = 271 pg/cell) and a cellular protein content of Z prot = 74.2% 35 were used. The value for dry cell weight was selected because it is representative of those that have previously been reported for CHO cells 36,37 , and Z prot = 74.2% was selected given that similar values have been reported for several mammalian cell types, including CHO [34][35][36][37][38] . Although different assays have been used to quantify Z prot (e.g. Bradford 34 and Biuret 38 ), the values that are reported typically range between 70% and 80%. The stoichiometric coefficient for NS consumption towards lipid glycosylation (ν N S GL k ) was obtained by multiplying the monosaccharide content of CHO cell lipids relative to total protein content (F N S GL k ) 22 by the above mentioned values for M cell and Z prot , as shown in Eq. 13. The calculations to obtain ν N S NglycHCP k , ν N S OglycHCP k , and ν N S GL k are shown in Supplementary Table ST9.
The stoichiometric coefficients for NS demand towards mAb Fc N-linked glycosylation ν ( ) N S Nglyc mAb , k were calculated with Eq. 14 using the values of f N S Nglyc mAb , k that were obtained as described in Step 4. The mAb was assumed to contain two N-linked glycosites on its Fc Nglyc mAb . The molecular weight of the mAb was calculated using the amino acid sequence of trastuzumab and its weighted average glycan composition 23 to yield a value of MW mAb = 148,545 g/mol. The calculation of ν N S Nglyc mAb , k is presented in Supplementary Table ST9.
Other types of cellular glycosylation. Although other types of glycosylation exist in mammalian cells, they have not been considered to greatly impact cellular NS demand due to their low abundance relative to the amount of HCP N-linked and O-GalNAc glycans and glycosphingolipids 39,40 . For example, O-mannose and O-fucose linked glycans have been indeed observed in the CHO cell glycoproteome, but at nearly undetectable levels 20 .
Similarly, dolichol pyrophosphate-linked oligosaccharides (Dol-PP-OS) have also been reported to be present in CHO cells 41 , but it is likely that NS demand towards this glycolipid pool is small compared to that of HCP and glycosphingolipid glycosylation. Intracellular accumulation of Dol-PP-OSs has been reported to depend on the availability of dolichol phosphate (Dol-P) 42 , which in turn, has been identified as a small fraction (only a few percent) of total cellular lipids in rat and mouse tissues 43 . In addition, the majority of monosaccharide constituents of the Dol-PP-OS precursor are cleaved off during glycan processing within the Golgi apparatus and are likely recycled for de novo synthesis of Dol-PP-OS. Therefore, it is possible to assume that NS consumption towards Dol-PP-OS synthesis is low compared to glycosphingolipid and HCP glycosylation and is, thus, negligible within our proposed framework.
Proteoglycans. Proteoglycans, which contain linear carbohydrate chains constituted by tandem disaccharide motifs (glycosaminoglycans), are more commonly associated with cell adhesion in connective tissue 44 , and thus, may not be prevalent in CHO cells 39 . In order to evaluate whether this potential source of NS consumption is low compared to cellular N-linked, O-GalNAc, and glycosphingolipid glycosylation, the frequency of this glycosylation type was estimated based on the CHO proteome quantification described in Step 1 above.
Specifically, the database of closest confirmed homologs to the CHO proteome (ST1) was queried for presence of the forty-three known proteoglycans 44 . The weighted average frequency of glycosaminoglycan (GAG) sites per CHO HCP (S GAG HCP ) was obtained by multiplying the reported number of glycosaminoglycan (GAG) sites per proteoglycan S ( )

GAG i HCP
, by the relative abundance of the corresponding protein (Z i ) and summing this product across the entire CHO proteome, as shown in Eq. 15. These calculations are shown in Supplementary Table ST10.

Glycosylphosphatidylinositol (GPI) anchors.
The glycophosphatidylinositol (GPI) anchor, which attaches proteins to the membrane of cells, also contains a carbohydrate component. The amount of free GPI and GPI-anchored proteins has also been reported to be low compared to N-linked, O-GalNAc, and glycosphingolipid glycosylation 39,40 . In order to estimate the abundance of GPI glycans in CHO cells, the number of potential GPI-anchored CHO HCPs (S GPI i HCP , ) was obtained by submitting the amino acid sequences of all confirmed homologous CHO HCPs (ST1) to the PredGPI GPI-anchor prediction server (http://gpcr.biocomp.unibo.it/predgpi/ pred.htm) 45 . The obtained S GPI i HCP , values were then multiplied by the relative abundance of each corresponding HCP (Z i ) and summed (Eq. 16) to obtain the weighted average frequency of GPI-anchored HCPs (S GPI HCP ) across the CHO proteome. These calculations are detailed in Supplementary Table ST10

Demand of NSs towards cellular and mAb glycosylation.
Considering that the demand for NSs is partitioned between HCP, lipid, and mAb glycosylation, two limits can be identified: (i) when cell growth greatly exceeds mAb productivity and (ii) when mAb productivity considerably outweighs growth. Both scenarios could be observed in typical fed-batch culture, where during exponential growth phase, the demand of NSs towards cellular glycosylation would predominate, whereas during stationary phase, the demand of NSs towards cellular glycosylation would decrease considerably and demand towards mAb glycosylation would become significant. In order to cover both possibilities, data for the specific growth rate (μ g ) and specific mAb productivity (q p ) corresponding to a breadth of productivity to growth ratios reported for industrially-relevant CHO cells have been retrieved from recent publications [24][25][26] .
Specifically, two sets of values for μ g and q p were extracted from each reference, the first corresponding to a low productivity to growth rate ratio (Dool-L 24 , Chus-L 25 , and Kant-L 26 ), and the second corresponding to a high productivity to growth ratio (Dool-H 24 , Chus-H 25 , and Kant-H 26 ) ( Table 1). All but the μ g and q p values obtained from Kant-H correspond to industrial CHO cell lines continuously expressing a recombinant mAb while growing exponentially. The selected μ g and q p values are considered typical for industrial CHO bioprocesses (μ g between 0.03 h −1 and 0.05 h −1 and q p between 20 pg/cell/day to 70 pg/cell/day) 46,47 . The values for Kant-H correspond to engineered CHO cells undergoing stationary growth while being cultured under mild hypothermic conditions and in the presence of sodium butyrate. The Kant-H values were selected to have an extremely high µ R q / p g that would be representative of the limit for scenario (ii) described above (q p = 94.05 pg/cell/day and μ g = 0.011h −1 ). We must note that a q p value of 100 pg/cell/day is commonly viewed as the upper threshold for CHO cell mAb specific productivity 48 . In all references, q p was determined based on extracellular measurements of mAb titre. Chusainow  Supplementary Table ST11.
Productivity to growth ratios ( µ R q p g / )
prot HCP cell prot g Cellular glycosylation calculator. Although literature-derived experimental data has been used to perform all calculations reported herein, we must highlight that the supplementary file is a set of integrated spreadsheets that is amenable to analysis and modification by users to include experimental data for specific rTGP-producing CHO cell cultures. Users may readily adjust the relative abundance of HCP N-linked and O-GalNAc glycans (ST5 and ST6), the glycoform distribution of the recombinant product (ST7), the molecular weight and number of N-linked glycosites on the product (ST9), and the specific growth rate and productivity (ST11) to obtain stoichiometric coefficients for NS consumption towards recombinant product and cellular glycosylation for any system. These coefficients can, in turn, be used to estimate the partition of NS consumption towards cellular and rTGP glycosylation for user-defined CHO cell culture data and can also be used as inputs for models of CHO cell metabolism.

Results and Discussion
Relative abundance of CHO cell proteins. The most highly abundant individual proteins are those associated with the cytoskeleton (e.g. γ -actin, β -actin, β -tubulin, α -tubulin, vimentin), glycolysis (glyceraldehyde-3-phosphate dehydrogenase, fructose-bisphosphate aldolase A), chromatin (e.g. histones H4 and H2B), ribosomes (e.g. 40S ribosomal protein S12 and S3A), and those involved in protein transcription, translation, and folding (e.g. heat shock cognate 71 kDa protein, eukaryotic translation initiation factor 5A-1, elongation factor 1α -1, and peptidyl-prolyl cis-trans isomerase A). A full list of the CHO cell proteome including the relative abundance of each protein is presented in Supplementary Table ST1. Figure 3A compares the most abundant categories (as obtained through relative quantification and gene ontology analysis) of CHO HCPs with those obtained from protein relative abundance data of a murine ovary cell line (http://pax-db.org/) 49,50 . Both datasets are in relative agreement for cytoskeletal, chromatin, and glycolysis and respiration proteins. Large deviations can be observed for HCPs involved in protein transcription, translation, and folding. Specifically, HCPs involved in protein biosynthesis (transcriptional, translational, and ribosomal proteins) are in considerably higher abundance in CHO cells than in the murine ovary cell line. These differences are likely to be species-specific and are consistent with the phenotypic traits (rapid proliferation and high protein productivity) that characterise CHO cells.
In order to further validate the proteome-based quantification, the relative abundance of amino acids in the CHO HCPs was compared with previously reported data on murine 38 and CHO 51 HCP amino acid content (Fig. 3B). In general, the trends in amino acid content are similar across all datasets. When compared with the murine hybridoma data, the proteome-derived amino acid distribution diverges only slightly, with the highest observed deviations being for glutamate (+ 2.0%), proline (− 1.7%), and lysine (+ 1.6%). These differences could be attributed to species-specific variations. When compared to the CHO dataset, the proteome-derived amino acid frequency diverges more. Particular differences can be seen for glutamate (− 4.1%), aspartate (− 2.9%), and lysine (− 1.5%). As the major deviations with respect to the CHO dataset are observed for amino acids that are products of deamination, this result could be an artefact of the analytical method. Another possibility is that the analytical method used by Selvarasu et al. 51 quantified total amino acid concentration (free amino acids plus those contained in protein). Because glutamate and aspartate are the key amino acids involved in the aspartate-malate shuttle, and CHO cells have high activity of this process 52 , a considerable amount of free glutamate and aspartate would be expected in this cell type. However, we cannot rule out that the deviation is due to the spectral counting quantification method employed. Table 2. The frequency of O-GalNAc glycosylation is estimated to be 26.64 sites per 100 host cell proteins, while the frequency of N-linked glycosylation is lower, at 8.09 sites per 100 host cell proteins. These values are significantly lower than those derived in previous studies 16,17 , where N-glycosylation frequency was considered to be 170 sites per 100 host cell proteins and 158 O-GalNAc sites per 100 host cell proteins. These studies did not consider HCP relative abundance or the number of glycosylation sites per HCP. Precisely to refine these initial estimates, both of these components are included in our strategy to obtain more representative values of HCP N-linked and O-GalNAc glycosylation sites across the CHO proteome. When considering relative HCP abundance, only ~4% (mol/mol) of CHO HCPs contain N-glycosites and ~6.5% (mol/mol) contain O-GalNAc sites. Furthermore, a recent publication estimates the frequency of N-linked glycosylation sites across the Pichia pastoris proteome at 1 site for every 4,691 amino acid residues 53 . Using our estimated value for weighted average CHO HCP length ( = . Len aac HCP 413 6 / HCP ) and N-glycosite frequency ( = . S Sites HCP 0 0809 / Nglyc HCP ) a value of 1 N-glycosite for every 5,102 amino acid residues is obtained (a deviation of only 8%). Although differences between P. pastoris and mammalian cell lines are expected, it would seem unusual for them to span two orders of magnitude. Irani et al. 53 obtained their N-glycosite frequency value by dividing the previously reported number of N-linked glycosites in P. pastoris by the total number of amino acid residues in that organism's proteome. Our strategy for obtaining O-GalNAc glycosylation site frequency is analogous, given that data for the O-GalNAc glycosylation sites of CHO cells was available 21 . Overall, we believe that the estimates presented herein and those reported by Irani et al. 53 are more accurate than the previous ones because HCP relative abundance and glycosylation site frequency have been considered. The relative abundance of all N-linked and O-GalNAc glycoproteins, along with their weighted contribution to glycosylation site frequency are presented in Supplementary Tables ST3 and ST4, respectively.

Frequency of N-linked and O-GalNAc glycosylation sites in CHO. The obtained frequencies for N-linked and O-GalNAc glycosylation sites in CHO HCPs are presented in
An interesting feature of our estimates is that the ten highest contributors (out of a total of 437 predicted N-linked glycoproteins) account for 50% of the overall N-linked glycosite frequency. Similarly, the ten highest contributors to O-GalNAc site frequency (out of 346 predicted O-GalNAc glycoproteins) account for 43% of the total O-GalNAc glycosite frequency (Fig. 4). The fact that a small subset of proteins contributes so heavily to CHO HCP glycosylation could prove extremely useful for monitoring HCP glycosylation experimentally. If dynamic variations in abundance, site occupancy, and glycoform distribution of these top ten contributors is representative of the remaining host cell glycoproteins, they could be used as markers for dynamic variations in HCP glycosylation.
Nine of the eighteen most abundant glycoproteins in the CHO proteome are involved in protein folding (Fig. 4). Five of these are chaperones (endoplasmin, hypoxia up-regulated protein 1, clusterin, 78 kDa glucose-regulated protein, and serpin H1), while four are either disulphide isomerases (protein disulphide-isomerase A3 and protein disulphide isomerase) or peptide isomerases (peptidyl-prolyl cis-trans isomerase A, peptidyl-prolyl cis-trans isomerase FKB10). It would be interesting to evaluate if the carbohydrates bound to these proteins influence their activity. If so, we could envision a situation where HCP glycosylation is limited by NS availability, which, in turn, would lead to accumulation of unfolded or misfolded proteins in the endoplasmic reticulum. Furthermore, many of the enzymes involved in glycosylation are themselves glycoproteins 54,55 . Again, if activity of these enzymes relies on their carbohydrate composition, reduced NS availability could directly influence recombinant protein glycosylation while simultaneously decreasing glyco-enzyme activity, thus impacting overall protein glycosylation in a feed-forward manner. Although yet to be confirmed, these mechanisms could cause cascading effects whereby reduced NS availability may negatively impact protein folding as well as HCP and recombinant product glycosylation.
Given the role of carbohydrates in cell-cell interactions 56 , the high observed abundance of cell surface, extracellular matrix, and secreted glycoproteins is expected.

Frequency of GAG, GPI, and O-Man glycosylation sites. GAG glycosites. Eleven proteoglycans were
found in the CHO proteome, including syndecan, glypican, agrin, decorin, and lumican (ST10). These proteoglycans are reported to contain heparan, chondroitin, and keratin sulphate GAGs 44 . The total weighted frequency of proteoglycans across the CHO proteome was found to be 0.38 sites per 100 CHO HCPs (ST10). This value is over twenty-fold lower than the one obtained for N-linked glycosites and 70-fold lower than that of O-GalNAc glycosites. These results suggest that, although present, proteoglycans are in low abundance compared to N-linked and O-GalNAc glycoproteins, and do not contribute considerably to NS consumption towards CHO cellular glycosylation, an observation that has also been made experimentally 39 . We therefore believe that our assumption of disregarding these HCP glycosylation types as considerable sinks for NS consumption is valid.
GPI glycosites. The PredGPI GPI-anchor prediction server (http://gpcr.biocomp.unibo.it/predgpi/) identified 48 GPI-anchoring sites across the CHO proteome (ST10). The predicted GPI-anchoring sites were found on cell surface proteins (e.g. legumain, ephrin, the renin receptor, tissue plasminogen activator), secretory pathway proteins (e.g. ER-Golgi 24 kDa SNARE, transmembrane protein 115, Syntaxin-16), as well as various enzymes and chaperones (e.g. UMP-CMP kinase, lipoprotein lipase, tissue α -L-fucosidase, proteome assembly chaperone 2, peptidyl-prolyl cis-trans isomerase FKBP8). The weighted frequency of free and occupied GPI sites in CHO was found to be 0.447sites/100HCP and 1.655sites/100HCP, respectively. The total number of GPI sites (2.103sites/100HCP) is approximately four-fold lower than the frequency of HCP N-linked glycans and over 12-fold lower than HCP O-GalNAc glycosites. Given that the difference in glycosite frequency is relatively low, the demand of NSs towards GPI glycosylation was included in our estimates. The monosaccharide composition of GPI glycans was assumed to be the one reported for the Hamster prion protein GPI-anchor (Man 4 Neu5AcGalNAcGlcN) 57 . The stoichiometric coefficients of NS consumption towards GPI synthesis were calculated using the obtained GPI frequency and the monosaccharide distribution for the Hamster prion protein GPI anchor, as described for the N-linked and O-GalNAc glycans in the materials and methods section. The stoichiometric coefficients for lipid glycosylation presented in Table 2 are the sum of the stoichiometric coefficients for glycosphingolipid and GPI glycosylation. Individual values for GPI and glycosphingolipid stoichiometric coefficients are presented in Supplementary Table ST9. O-Man glycosite frequency. Cadherins and plexins have been recently reported as the major O-mannosylated glycoproteins in humans 58 . This study reports 133 O-Man sites across 37 cadherins and 8 O-Man sites in 6 plexins out of a total of 235 identified O-Man glycosites. A third major source of O-Man glycosites was reported to be α -dystroglycan, with 13 sites. In order to evaluate whether O-mannose glycans are considerable sinks for NS consumption in CHO cells, the database for closes homologs to the CHO proteome (ST1) was queried for the presence of cadherins, plexins, and α -dystroglycan. Three cadherins (protocadherin fat1, protocadherin fat 3, and cadherin 9), four plexins (A4, C1, D1, and A1), and α -dystroglycan were found across the CHO proteome. The total relative abundance of the above seven proteins plus α -dystroglycan was found to be 0.014% of the CHO proteome. In total, these eight proteins contain 11 confirmed O-Man sites 58 . When this value is multiplied by the total relative abundance of these proteins, this yields a weighted O-Man glycosite frequency of 0.154 sites per 100 HCPs. This value is 50-fold lower than the number of N-linked glycosites and nearly 200-fold lower than that of O-GalNAc glycosites. This result confirms that O-Man glycans are in considerably lower abundance than other types of HCP glycosylation, and do not significantly contribute to NS consumption.
Stoichiometric coefficients for NSs consumed towards CHO cell glycosylation. The stoichiometric coefficients range over three orders of magnitude (Table 2) and are higher for monosaccharides that are abundant in the average glycoform structure (e.g. GlcNAc in N-linked glycosylation) and lower for less abundant species (e.g. Neu5Gc). The high stoichiometric coefficients for GDP-Man and UDP-Glc towards N-linked glycosylation reflect the presence of their corresponding monosaccharides in the precursor oligosaccharide that initiates N-linked glycosylation, even though most of the mannose and all of the glucose residues are trimmed from the final structure through processing along the secretory pathway. It is worth noting that demand for these NSs may be lower considering that their monosaccharide components could be recycled after being cleaved off of the glycoprotein during the processing reactions.
Interestingly ) and is due to the high frequency of identified O-GalNAc glycosylation sites across the CHO proteome 21 (Table 2) and the presence of galactose in all O-GalNAc glycans 20 (Fig. 2). The stoichiometric coefficient of UDP-Gal towards lipid glycosylation was calculated as ν = .
nmol cells 0 504 /10 UDPGal NglycHCP 6 ). The higher stoichiometric coefficient for lipid glycosylation is likely due to the presence of galactose in the most abundant glycosphingolipids reported for CHO cells 22,59 .
Similarly to UDP-Gal, the highest obtained stoichiometric coefficient for CMP-Neu5Ac consumption is towards O-GalNAc HCP glycosylation (ν = . ). Again, this is due to the high frequency of O-GalNAc glycans across the CHO proteome and the high Neu5Ac content (at least one residue per glycan) of O-GalNAc glycans (Fig. 2). The second largest coefficient for CMP-Neu5Ac consumption is towards lipid glycosylation (ν = . nmol cells 0 403 /10 CMPNeu Ac GL 5 6 ), and is due to the presence of Neu5Ac on the G M3 ganglioside, which is the most abundant glycosphingolipid in CHO cells 22,59 . The low stoichiometric demand of CMP-Neu5Ac towards HCP N-linked glycosylation (ν = . nmol cells 0 034 /10 CMPNeu Ac NglycHCP 5 6 ) is due to the low abundance of the corresponding monosaccharide in CHO HCP N-linked glycans (Fig. 2

Flux distributions towards cellular and mAb glycosylation. GDP-Fuc and UDP-HexNAc flux towards
HCP and mAb glycosylation. GDP-Fuc is exclusively consumed for N-linked glycosylation and its flux is distributed between host cell proteins and recombinant mAb according to demand. In each of the cases tested, the majority of GDP-Fuc flux goes towards mAb production (between 73% and 97%) due to the low abundance of fucosylated species in HCP glycans (Fig. 2). The flux of UDP-HexNAc is more evenly distributed between HCP and mAb glycosylation. Except for the Dool-L dataset, more than half of the UDP-HexNAc flux goes towards mAb glycosylation (Fig. 5), which is likely due to the high GlcNAc content of mAb N-glycans (Fig. 2). Despite the typically higher demand towards mAb glycosylation, the flux of UDP-HexNAc towards HCP glycosylation is significant and ranges between 11.9% and 59.8%. UDP-HexNAc consumption towards HCP glycosylation is almost evenly split between N-linked and O-GalNAc glycosylation. This arises from the similar stoichiometric coefficients obtained for UDP-GlcNAc towards HCP N-linked glycosylation and UDP-GalNAc towards HCP mucin-type glycosylation.
UDP-Gal and CMP-Neu5Ac flux towards cellular and mAb glycosylation. For all scenarios, HCP mucin-type glycosylation consumes the majority of CMP-Neu5Ac (between 73.8% and 75.8%), and the second highest demand for this NS is observed for lipid glycosylation (21.1% to 21.7%). Interestingly, the distribution of CMP-Neu5Ac consumption is almost independent of µ R q / p g , which is due to the scarcity of sialic acids in the mAb.  (Table 2), by values of q p and μ g reported for industrial cell lines (Table 1)  For this reason, CMP-Neu5Ac consumption depends almost exclusively on the specific growth rate of CHO cells, and thus the distribution of its consumption remains nearly constant across different µ R q /  26 . Under these conditions, the cells characterised by Kantardjieff et al. yield an exceptionally high specific mAb productivity (~100pg/cell/day), which causes the observed demand in UDP-Gal towards mAb glycosylation. Thus, it is possible to consider all other tested scenarios of µ R q / p g as more representative of typical CHO cell culture. In these typical CHO culture scenarios, more UDP-Gal is consumed towards cellular glycosylation than to product glycosylation, which is particularly relevant when considering NS precursor feeding strategies. Our results suggest that over half of NS precursors fed to control mAb galactosylation would be destined to cellular components and would not reach their intended target. This is consistent with previous findings where cell surface galactosylation was observed to increase more (up to four-fold) than mAb galactose content (up to two-fold) with different uridine-manganese-galactose feeding strategies 12 .
The changes in UDP-Gal flux distributions observed at varying µ R q / p g are the broadest observed in this study and are most likely due to the fact that this NS is the only one that is consumed for all considered forms of glycosylation (HCP N-linked and O-GalNAc, lipid, and mAb glycosylation). Given the observed variability of UDP-Gal flux distributions, it is no surprise that cell surface galactose content has been reported to correlate well with antibody galactosylation 12 . Furthermore, these results indicate that the partition of UDP-Gal between cellular and product glycosylation may be one of the underlying causes of galactosylation-associated microheterogeneity of mAbs produced in CHO: if a substantial proportion of available UDP-Gal is continuously consumed for cellular glycosylation, availability of this NS may fluctuate considerably throughout cell culture, and thus, lead to variations in rTP galactose content.
When considering that excess uridine and galactose supplementation have been reported to hinder cell growth and final product titre 10,12,13 , our results for UDP-Gal flux distributions further highlight the importance of having accurate estimates for NS consumption towards HCP glycosylation. Indicative estimates for the consumption of UDP-Gal towards HCP glycosylation are necessary to determine optimal uridine and galactose feeding strategies that ensure adequate rTP galactosylation while minimising (or completely avoiding) deleterious effects on cell growth and product yield.
The results obtained with our proposed framework are in qualitative agreement with previous findings, but the estimates could be influenced by: i) variations in the relative abundance of CHO HCPs and lipids, ii) variations in the occupancy of N-linked, O-GalNAc, and lipid glycosylation sites, and iii) changes in glycosylation microheterogeneity. All three of the above are likely to vary over time, depending on cell culture conditions, NS precursor supplementation strategy, growth phase, and cellular metabolic state. Indeed (and possibly due to a combination of proteomic, macroheterogeneity, and microheterogeneity effects), qualitative variations in CHO cell surface glycosylation have been reported 12,60 . The overall framework presented herein could be readily applied, via the CHO cellular glycosylation calculator provided in the supplementary file, to any experimental data so that variations in the demand for NSs towards cellular and rTGP glycosylation can be represented for different cell lines grown under different culture conditions. Experimentally, these limitations can also be addressed by monitoring the relative abundance and glycosylation of the subset of CHO proteins that were found to contribute most to HCP glycosylation.

Conclusions
The work presented herein describes a strategy to estimate the metabolic demand of NSs towards protein and lipid glycosylation in CHO cells. The estimate combines the relative abundance of individual CHO host proteins with the number of N-linked and O-GalNAc glycosylation sites present on each HCP and the reported CHO HCP glycome. The stoichiometric demand of NSs towards glycolipid (glycosphingolipid and GPI anchor) synthesis has also been included. Overall, our results show that the consumption of NSs towards HCP glycosylation is, in most cases, significant and cannot be neglected when rationally designing NS precursor feeding strategies or while developing mechanistic mathematical models for cell growth and recombinant protein glycosylation. The obtained stoichiometric coefficients for NS consumption towards cellular glycosylation are a first approximation towards a fuller quantitative understanding of how the process of HCP, glycosphingolipid, and rTP glycosylation are integrated. In addition, we have identified a subset of CHO HCPs which, if monitored for abundance, glyco-site occupancy, and glycan microheterogeneity, could refine the obtained stoichiometric coefficients, and Scientific RepoRts | 6:28547 | DOI: 10.1038/srep28547 thus improve our quantitative understanding of cellular and rTP glycosylation. In the future, analysis of the overall burden of protein glycosylation on cellular metabolism will lead to mechanistically-defined feeding strategies for quantitative and optimal control of rTP glycosylation.