Genome-scale reconstructions of the mammalian secretory pathway predict metabolic costs and limitations of protein secretion

In mammalian cells, >25% of synthesized proteins are exported through the secretory pathway. The pathway complexity, however, obfuscates its impact on the secretion of different proteins. Unraveling its impact on diverse proteins is particularly important for biopharmaceutical production. Here we delineate the core secretory pathway functions and integrate them with genome-scale metabolic reconstructions of human, mouse, and Chinese hamster ovary cells. The resulting reconstructions enable the computation of energetic costs and machinery demands of each secreted protein. By integrating additional omics data, we find that highly secretory cells have adapted to reduce expression and secretion of other expensive host cell proteins. Furthermore, we predict metabolic costs and maximum productivities of biotherapeutic proteins and identify protein features that most significantly impact protein secretion. Finally, the model successfully predicts the increase in secretion of a monoclonal antibody after silencing a highly expressed selection marker. This work represents a knowledgebase of the mammalian secretory pathway that serves as a novel tool for systems biotechnology.

In mammalian cells, >25% of synthesized proteins are exported through the secretory pathway. The 23 pathway complexity, however, obfuscates its impact on the secretion of different proteins. Unraveling its 24 impact on diverse proteins is particularly important for biopharmaceutical production. Here we delineate 25 the core secretory pathway functions and integrate them with genome-scale metabolic reconstructions 26 of human, mouse, and Chinese hamster cells. The resulting reconstructions enable the computation of 27 energetic costs and machinery demands of each secreted protein. By integrating additional omics data, 28 we find that highly secretory cells have adapted to reduce expression and secretion of other expensive 29 host cell proteins. Furthermore, we predict metabolic costs and maximum productivities of biotherapeutic 30 proteins and identify protein features that most significantly impact protein secretion. Finally, the model 31 successfully predicts the increase in secretion of a monoclonal antibody after silencing a highly expressed 32 selection marker. This work represents a knowledgebase of the mammalian secretory pathway that serves 33 as a novel tool for systems biotechnology. that could be used for antibody production, we predicted how much additional antibody could be 211 synthesized with the elimination of the neoR gene. We simulated antibody production following a 212 complete knockout of neoR (see Table 2 and Fig. 5b) and predicted that the deletion of neoR could 213 increase specific productivity by up to 4% and 29% on days 3 (early exponential phase) and 6 (late phase) 214 of culture, respectively (Fig. 5c). This was qualitatively consistent with the experimentally observed values 215 of 2% and 14% when neoR mRNA was knocked down by 80-85%. We then computed the Pareto optimality 216 curves for both the control and the neoR in silico knockout conditions on day 6. We found that the length 217 of the curve (denoted by ) increased by 18% when neoR production is eliminated (Fig. 5d). Thus, 218 iCHO2048s can quantify how much non-essential gene knockouts can boost growth and productivity in 219 CHO cells by freeing energetic and secretory resources. In fact, the ribosome-profiling data from 220 Kallehauge et al. revealed that only 30 secretory proteins in CHO cells account for more than 50% of the 221 ribosomal load directed towards translation of protein bearing a signal peptide (Fig. 4E). Indeed, we 222 recently found that substantial resources can be liberated and recombinant protein titers can be increased 223 when 14 high-abundance host cell proteins were knocked out 20

227
Mammalian cells synthesize and process thousands of proteins through their secretory pathway. 228 Many of these proteins, including hormones, enzymes, and receptors, are essential for mediating 229 mammalian cell interactions with their environment. Therefore, many have therapeutic importance either 230 as drugs or as targets. The expression and secretion of recombinant proteins represents a significant 231 anabolic demand that drains several substrates from cellular metabolism (e.g., amino acids, sugar 232 nucleotides, ATP) 21,22 . Furthermore, the recombinant proteins demand adequate expression of supporting 233 proteins involved in their transcription, translation, folding, modification, and secretion. Thus, there has 234 been an increasing interest in engineering the mammalian secretory pathway to boost protein 235 production [23][24][25][26] . Despite important advances in the field 27 , current strategies to engineer the secretory 236 pathway have remained predominantly empirical 28,29 . Recent modeling approaches, however, have 237 enabled the analysis of the metabolic capabilities of important eukaryotic cells under different genetic 238 and environmental conditions 17,[30][31][32] . With the development of genome-scale models of protein-producing 239 cells, such as CHO 17 , HEK-293 33 and hybridomas 34,35 , it is now possible to gain a systems-level 240 understanding of the mammalian protein production phenotype 36 . 241

242
Efforts have been underway to enumerate the machinery needed for protein production. For 243 example, Lund and colleagues 6 recently reconstructed a comprehensive genetic network of the mouse 244 secretory pathway. By comparing the mouse and CHO-K1 genomes and mapping CHO gene expression 245 data onto this network, the authors identified potential targets for CHO cell engineering, demonstrating 246 the potential of systems biology to interrogate and understand protein secretion in animal cells. This 247 genetic network reconstruction, although useful for contextualizing omics data (e.g., RNA-Seq), is not set 248 up for simulations of protein production, nor integrated with additional cellular processes such as 249 metabolism. Therefore, our work is complementary in that it allows one to also to quantify the cost and 250 cellular capacity for protein production by delineating the mechanisms of all biosynthetic steps and 251 bioenergetic processes in the cell. 252

253
Here we presented the first genome-scale reconstruction of the secretory pathway in mammalian 254 cells coupled to metabolism. We connected this to current metabolic networks, yielding models of protein 255 secretion and metabolism for human, mouse and CHO cells. These models compile decades of research 256 in biochemistry and cell biology of higher eukaryotes and present it in a mathematical model. Using our 257 model, we quantitatively estimated the energetic cost of producing several therapeutic proteins and all 9 proteins in the CHO cell and human secretomes. We also identified factors limiting the secretion of 259 individual products and observed that these depend on both the complexity of the product and the 260 composition of the culture media. Furthermore, by integrating ribosomal profiling data with our model 261 we found that CHO cells have selectively suppressed the expression of energetically expensive secreted 262 proteins. Expanding upon this observation, we demonstrated that specific productivities can be 263 predictably increased following the knock-down of an energetically expensive, non-essential protein. 264 Furthermore, consistent with this, we have recently shown more than 50% reductions in total host cell 265 protein production, along with increases in mAb titer when deleting 14 highly abundant proteins in CHO 266 cells. Further studies will likely further explore how much of the CHO cell proteome can be deleted to 267 further enhance recombinant protein secretion 20 . 268

269
It is important to note that while our models capture major features of secreted proteins, there are 270 additional PTMs (e.g., phosphorylation, gamma carboxylation), pathway machinery (e.g., chaperones), 271 and cell processes that could possibly be captured in further expansions of the modeling framework 6 (e.g., 272 the unfolded protein response). These could be included as energetic costs associated with building and 273 maintaining the secretory machinery (chaperones 3 , disulfide oxidoreductases 37 , glycosyltransferases 38 ); 274 protein stability and turnover rates 39 ; solubility constraints 40 and molecular crowding effects 41 . As these 275 are captured by the models in a protein product-specific manner, predictions of protein production 276 capacity will improve, and the models could provide further insights for cell engineering for biotechnology 277 or to obtain a deeper understanding of mechanisms underlying amyloid diseases. Finally, a simplification 278 of our secretory model is that it only computes the bioenergetic cost of synthesizing and attaching single 279 In conclusion, the results of our study have important implications regarding the ability to predict 288 protein expression based on protein specific attributes and energetic requirements. The secretory 289 pathway models here stand as novel tools to study mammalian cells and the energetic trade-off between 290 growth and protein secretion in a product-and cell-specific manner. We presented algorithms that 291 provide novel insights with our models, and expect that many other methods can be developed to answer 292 a wide array of questions surrounding the secretory pathway, as seen for metabolism 45 . To facilitate 293 further use of these models, we provide our code and detailed instructions on how to construct protein-

334
The number of N-linked glycosylation sites in the PSIM was determined computationally and 335 experimentally as follows. CHO-K1 cells (ATCC) were lysed, denatured, reduced, alkylated and digested by 336 trypsin. Desalted peptides were incubated with 10 mM sodium periodate in dark for 1 hour before 337 coupling to 50 μL of (50% slurry) hydrazide resins. After incubation overnight, non-glycosylated peptides 338 were washed with 1.5 M NaCl and water. The N-glycosylated peptides were released with PNGaseF at 37 339 °C and desalted by using a C18 SepPak column. Strong cation exchange (SCX) chromatography was used 340 to separate the sample into 8 fractions. Each fraction was analyzed on an LTQ-Orbitrap Velos (Thermo 341 Electron, Bremen, Germany) mass spectrometer. During the mass spectrometry data analysis, 342 carbamidomethylation was set as a fixed modification while oxidation, pyroglutamine and deamidation 343 were variable modifications. 344 345 346 We wrote a Jupyter Notebook in Python (see Jupyter Notebook A) that takes a row from the PSIM as 347 input to produce an expanded iCHO2048s, Recon 2.2s, or iMM1685s metabolic model with the product- Energetic cost of vesicular transport and secretion: We used published data 51-53 (see Supplementary Data 407 1) to compute stoichiometric coefficients for reactions involving vesicular transport. That is, the number 408 of GTP molecules bound to RAB and coat proteins in each type of vesicle (COPII and secretory vesicles). 409

Construction of models and constraint-based analysis
We found that a total of 192 and 44 GTPs must be hydrolyzed to transport one COPII or secretory (i.e. 410 clathrin coated) vesicle from the origin membrane to the target membrane, respectively. Since vesicles 411 do not transport a single protein molecule at a time, we estimated the number of secreted protein 412 molecules that would fit inside a spherical vesicle (see estimated and assumed diameters in the 413 Supplementary Data 1). For that, we assumed that the secreted protein is globular and has a volume VP 414 (nm 3 ) that is directly proportional to its molecular weight MW 54 : 415

Constraints used in models and Pareto optimality frontiers
423 All models were constrained using different sets of experimental uptake rates, which can be found in         representative biotherapeutics produced in CHO cells are shown for comparison purposes (see Table 1). 643 Growth rates were computed using an IgG-specific iCHO2048s model and compared to experimentally 667 measured growth rates from six datasets from two previous studies using IgG-producing cell lines 12,18 . NT 668 and TK specify the initials of the first author of the two studies (Neil Templeton, Thomas Kallehauge). (b) 669 Additional growth, productivity, and metabolomic data were obtained from Enbrel and C1INH-producing 670 phase were consistent with experimental growth rates of Enbrel-producing CHO cells and C1INH-672 producing CHO cells at almost all time points. In all cases, the iCHO2048s models were constrained to 673 produce the recombinant protein at the measured specific productivity rate. The values used to constrain      to quantify the contribution of PTMs to the explained variation in speci c productivity using uptake rates di erent from those used in Figure 4c. The speci c consumption rates are listed in Supplementary Table 3

Overview of the Secretory Pathway in animal cells
Historically, most of the knowledge on the secretory pathway was obtained by studying protein transport processes and secretion in Saccharomyces cerevisiae 1 . Albeit quite similar in core functions, the secretory pathways of mammalian cells and fungi differ significantly in some of the steps which have been evolved based on species-specific secretion phenotypes 2 . The following paragraphs briefly overview the mammalian secretory pathway and highlights pathways exclusive to animals not present in fungi. The last section provides an in-depth comparison of the yeast and animal secretory pathways while highlighting the most important differences between both.

Translocation and processing in endoplasmic reticulum
Proteins destined to the secretory pathway generally bear a signal peptide at the amino-terminus which targets the proteins to the endoplasmic reticulum (ER) where the initial post-translational modifications (PTMs) take place. This transport requires translocating the target protein across the ER membrane through two general pathways: co-translational translocation (GTP dependent) and post-translational translocation (ATP dependent) 3 . An additional pathway for tail-anchored (TA) proteins into the ER membrane has also been discussed in the literature and included in our iCHO1921s reconstruction 4,5 .
Once inside the ER lumen, the target proteins are folded by the action of several transmembrane ER proteins, including calnexin, calreticulin, and other luminal chaperones [6][7][8] . In the event of protein misfolding, a target protein may go through a "quality control" system (exclusive in the mammalian secretory pathway) that attempts to correct for folding errors 9,10 . However, if the misfolded state of the protein is sustained for too long, the protein then enters the ER associated degradation pathway, or ERAD, which involves retrotranslocation of the misfolded protein back to the cytosol, ubiquitination and proteasomal degradation [11][12][13] .

A note on translocation pathways
In co-translational translocation, proteins destined to the secretory pathway bear a hydrophobic signal sequence at the amino-terminus that promotes the targeting of ribosome-nascent chain (RNC) complexes to the ER via binding to the signal recognition particle (SRP). The SRP recognizes the signal peptide as soon as it emerges from the ribosome during translation. Then, the newly formed SRP-RNC complex is recognized by the SRP receptor on the ER membrane where translocation is initiated by interaction with the Sec61 complex (Sec61C) and assisted by the chaperone BiP to increase the efficiency and ensure the unidirectionality of this process 30 .
Post-translational translocation, in contrast to co-translational translocation, occurs independently of SRP and its receptor 34 . Furthermore, this process does not rely too heavily on the Sec61C to translocate the target protein and instead utilizes the protein Sec62 as a safe route that guarantees the efficient translocation of small proteins (<160 amino acids in length) 35 .
Finally, the pathway for inserting TA proteins into the ER membrane also occurs post-translationally due to the fact that the ER targeting signal of TA proteins is located very close to the carboxy-terminus, which allows the ribosome to release the protein before it is recognized and localized to the ER 36 . This pathway depends on ATP and one of the main players in the process is a transmembrane recognition complex known as TRC40 or Asna1 37 .

Important differences between the yeast and animal secretory pathways
As mentioned above, core functions of the secretory pathway are conserved between mammalian and yeast cells. These core functions (see Table SD.2) are: • Translocation through endoplasmic reticulum • Primary glycosylation in ER (N-linked glycans) and Golgi (N-linked and O-linked glycans) • Protein folding and quality control in ER • Anterograde and retrograde vesicular transport between ER and Golgi via COPII and COPI vesicles, respectively.
• Dolichol pathway for N-linked core glycan translocation through the ER membrane • Endoplasmic reticulum associated degradation (ERAD) • GPI biosynthesis • Unfolded protein response (UPR) Nevertheless, minor and major differences exist between the yeast and mammalian secretory pathways.
Some of these differences have been thoroughly reviewed before in an excellent review by Delic and colleagues 2 and are summarized in Table SD.1 below. Here, we highlight the major differences between both secretory pathways that are relevant for modeling purposes using the secretory reconstructions. Finally, the table below summarizes the differences between the mammalian and the fungal secretory pathway reconstructions in terms of components, reactions, and subsystems.