Comprehensive modeling of cell culture profile using Raman spectroscopy and machine learning

Tanemura, Hiroki; Kitamura, Ryunosuke; Yamada, Yasuko; Hoshino, Masato; Kakihara, Hirofumi; Nonaka, Koichi

doi:10.1038/s41598-023-49257-0

Download PDF

Article
Open access
Published: 09 December 2023

Comprehensive modeling of cell culture profile using Raman spectroscopy and machine learning

Hiroki Tanemura¹,
Ryunosuke Kitamura¹,
Yasuko Yamada²,
Masato Hoshino¹,
Hirofumi Kakihara¹ &
…
Koichi Nonaka³

Scientific Reports volume 13, Article number: 21805 (2023) Cite this article

3058 Accesses
3 Citations
2 Altmetric
Metrics details

Subjects

Abstract

Chinese hamster ovary (CHO) cells are widely utilized in the production of antibody drugs. To ensure the production of large quantities of antibodies that meet the required specifications, it is crucial to monitor and control the levels of metabolites comprehensively during CHO cell culture. In recent years, continuous analysis methods employing on-line/in-line techniques using Raman spectroscopy have attracted attention. While these analytical methods can nondestructively monitor culture data, constructing a highly accurate measurement model for numerous components is time-consuming, making it challenging to implement in the rapid research and development of pharmaceutical manufacturing processes. In this study, we developed a comprehensive, simple, and automated method for constructing a Raman model of various components measured by LC–MS and other techniques using machine learning with Python. Preprocessing and spectral-range optimization of data for model construction (partial least square (PLS) regression) were automated and accelerated using Bayes optimization. Subsequently, models were constructed for each component using various model construction techniques, including linear regression, ridge regression, XGBoost, and neural network. This enabled the model accuracy to be improved compared with PLS regression. This automated approach allows continuous monitoring of various parameters for over 100 components, facilitating process optimization and process monitoring of CHO cells.

Machine learning and metabolic modelling assisted implementation of a novel process analytical technology in cell and gene therapy manufacturing

Article Open access 16 January 2023

Employing active learning in the optimization of culture medium for mammalian cells

Article Open access 30 May 2023

Comparison of wavelength selection methods for in-vitro estimation of lactate: a new unconstrained, genetic algorithm-based wavelength selection

Article Open access 09 October 2020

Introduction

In recent years, the development of monoclonal antibodies (mAbs) using genetic recombination techniques has garnered increasing attention given the potential of these agents regarding their high specificity and efficacy. Since the approval of muromonab-CD3 in 1986, antibody-based drugs have been predominantly developed for cancer and autoimmune diseases, with over 120 drugs approved by 2021¹. Chinese hamster ovary cells (CHO cells) are the primary choice for manufacturing antibody drugs, and efforts have been made to develop a stable antibody production process². The productivity of antibodies in CHO cells significantly impacts the cost of production and stability of supply. Ensuring production of the required quantity in a single production run is desirable, especially considering the limited production facilities available. With the demand for antibody drugs growing each year, there is a strong societal need to improve their productivity.

In biopharmaceutical production, there are two main methods for culturing CHO cells: fed-batch culture and perfusion culture. When employing either of these methods, it is crucial to appropriately monitor and control key factors to achieve the high production of high-quality antibodies³. Medium components such as glucose and amino acids, along with various metabolites, play a significant role in the productivity and quality of antibodies. Analyzing and managing the concentrations of these factors is known to improve antibody productivity⁴. Oxidative and endoplasmic reticulum stress during cell culture may also impact antibody productivity⁵. Previously, we identified the Hspa5 promoter, whose expression of antibodies is suggested to be directly affected by endoplasmic reticulum stress⁶. It is important to note that the factors related to stress are not limited to a single factor, but rather several factors. Therefore, monitoring stress markers comprehensively is especially important for this type of promoter.

At present, to monitor cell culture profile during production culture, small samples of medium components and metabolites are taken at certain culture points and quantified using a bioanalyzer or LC–MS. However, this sampling process poses challenges, including potential effects on the culture volume and the risk of microbial contamination. Moreover, the limited number of sampling points makes it difficult to obtain data at high frequencies. Consequently, various process analytical technology (PAT) methods have been developed for continuous analysis. For example, Raman spectrometers and near-infrared spectroscopy can provide information on components in the culture solution, while capacitance-based measurements enable cellular concentration analysis⁷. In recent years, the utilization of Raman spectrometers has been explored not only in cultivation processes but also in purification processes to maintain process consistency⁸.

In this study, we focused on the potential of Raman spectrometry as a method for continuous analysis. Raman spectroscopy equipment generates Raman scattering light by irradiating laser light onto a sample, which carries information on the inherent oscillation frequency of molecules^9,10,11. By detecting this light, the concentration of specific molecules can be measured. In previous studies, Raman measurement systems were developed for primary medium components and metabolite concentrations, including glucose, lactate, and amino acids^12,13. Additionally, feedback control systems have been established to achieve optimal concentrations^14,15,16,17. Moreover, it has been reported that the constructed model is scalable across different culture scales¹⁸. Reports of measuring cell growth, pH, and antibody quality, as well as applications in perfusion culture, have also been published^{19,20,21,22,23,24}. Furthermore, equipment for acquiring Raman spectra using small-scale reactors has been commercialized and is effective for constructing models that require a large number of data points^25,26.

Partial least square (PLS) regression is commonly employed as a model-building method, which selects principal components to capture a linear relationship between predictor and response variables. It allows the construction of more accurate models by reducing explanatory variables through dimensionality reduction. Spectral range selection methods were previously reported such as manual selection based on prior knowledge, stepwise selection, and genetic algorithms^27,28,29. However, optimizing the model construction conditions, such as spectrum range and pretreatment of spectrum data, to enhance accuracy involving a time-consuming process of trial and error. This makes it challenging to construct highly accurate models for numerous measurement items within the rapid research and development context of pharmaceutical manufacturing processes.

Recently, machine learning has emerged as a method for improving model construction accuracy, and some studies have reported its application in analyzing Raman spectral data^{30,31,32,33,34}. By incorporating machine-learning techniques into Raman measurement model construction for culture profiles, it is possible to efficiently construct highly accurate models, even when PLS regression fails to produce high-performance models.

In this study, we focused on measuring broad range of categories of parameters and developed a comprehensive, simple, and automated method for constructing an exhaustive Raman model using machine learning in Python. This method enabled convenient and high-throughput data acquisition for Raman model construction by utilizing a small automated culture vessel. As we aimed to comprehensively construct models for a large number of components, we employed Bayesian optimization for optimizing the preprocessing, spectral range and hyperparameter. Bayesian optimization is a method that estimates the global optimum of a function by learning unknown functions from data using Gaussian process regression³⁵, while minimizing the number of trial iterations and make it easy to construct suitable model for various components. Subsequently, models were constructed for each component using various model construction techniques, such as linear regression, ridge regression, XGBoost, and neural network, and their accuracies were compared.

Results

Automated and comprehensive construction of optimal Raman models by PLS regressions with Bayes optimization

To construct a Raman measurement model for data from various cell cultures, we initially cultured CHO cells using multiple small culture vessels (Ambr250), sampled them over time, and measured the levels of metabolites and medium components. Raman spectra were acquired using the Spectroscopy module at the same time as sampling. To obtain results from diverse cultures, three strains were used as cell clones, and fed-batch cultures were performed in duplicate. Several analyses were performed, including metabolite analysis using a bioanalyzer, antibody concentration analysis by HPLC, cell concentration and cell viability analyses by ViCEll, and metabolite component analysis by LC–MS, to comprehensively acquire data from various cultures (Supplemental Fig. 1).

Subsequently, a Raman measurement model for data from various cultures was constructed using PLS regression from the obtained culture data and Raman spectral data. The spectral domain used for model construction, hyperparameter (n_components) of PLS regression analysis, and optimization of data preprocessing methods were performed following the scheme shown in Fig. 1A. The process involved the following steps. (1) Specifying the spectral regions used for model construction. (2) Generating datasets for the specified spectral regions and preprocessing them in the order of untreated, normalized, smoothed (moving average of 10 points), first-order differential, and second-order differential. (3) Creating a dataset that includes the preprocessed spectral data, along with the data of the component density for model construction (after normalization). The dataset was divided into learning and test data, and PLS regression was performed. Principal component counts for PLS regressions ranged from 0 to 15, with five of the six reactor datasets used as learning data and one as verification data. (4) Creating a model by PLS regression using Python machine-learning library (scikit-learn). (5) Evaluating the model performance using the test data and calculating R² and root mean squared error (RMSE) as indicators. (6) Changing the principal component numbers of the spectral domain and PLS regression and performing PLS regression with the Bayes optimization specifying the parameters to be examined next. (7) Repeating the above steps and calculating the spectral range, pretreatment conditions, and principal component numbers of PLS regressions where RMSE was minimal when the test data were applied to the model.

Raman models for data from various cultures were constructed using the above methods, the results of which are presented in Table 1. When examining the model accuracy by compound group, a model with an R² greater than 0.5 was constructed for almost all amino acids. Interestingly, models with high measurement accuracy were also constructed for theoretically undetectable components, such as metal ions, oxygen, and carbon dioxide. However, for vitamins, approximately half of the components that could be modeled and constructed had an R² below 0.5. The modeling accuracy for metabolites (carbohydrate metabolism, amino acid metabolism, nucleic acid metabolism) with an R² above 0.5 was approximately half. These results demonstrate the successful construction of Raman measurement models with fixed precision for each compound group. Comparing the model accuracy between different preprocessing methods revealed variations in model accuracy (Table 2).

Table 1 Model outcomes (PLS regression) for individual components.

Full size table

Table 2 Comparing modeling accuracy with different preprocessing methods.

Full size table

Furthermore, to verify the effect of optimizing the model construction conditions, the model performance was compared to the case where the conditions were not optimized. The spectra used the full spectral domain without preprocessing, and n_components was set to 2, 5, 10, 15 and 20. The constructed models’ R² and RMSE were plotted, and the mean and error ranges were compared using box-and-whisker plots. The results demonstrated that the average model accuracy increased with the higher n_components value and optimizing the model construction conditions improved average R² to 0.62 and decreased average RMSE to 0.35, indicating the significant enhancement of model performance (Fig. 1B).

To visualize the accuracy of measurement of each component, glucose concentration was used as an example. The plots of predicted and actual measurements, the time course of glucose concentration in each culture vessel (actual measurements), the used spectral regions (pretreated), and the modeling factors are shown in Fig. 2A–D. Comparison between the actual time course and prediction results revealed similar transitions in the learning and test data, indicating the construction of a highly accurate prediction model, with R² of 0.93 and RMSE of 0.23 (Fig. 2E,F).

Comparison of machine-learning methods

To assess the improvement of accuracy of the model, we examined machine-learning methods other than PLS regression using a similar approach. Linear regression and ridge regression, commonly used as regression analysis methods, were considered as model construction methods. For ridge regression, optimization of the hyperparameter α was also performed. XGBoost and neural network were validated as machine-learning techniques. Default hyperparameters for scikit-learn in Python were used. Raman-measuring models for each component were constructed using these machine-learning techniques. R², RMSE values of the constructed models were plotted, and the means and error ranges were compared using box-and-whisker plots (Fig. 3).

Regarding the target component with an R² value less than 0.5 in the PLS model, it was demonstrated that the R² improved and RMSE decreased for several components when using another modeling method (Fig. 3A,B). When considering all components, regardless of the R² value in the PLS model, the average R² of certain categories, such as amino acid metabolites and vitamins, improved with XGBoost, while others, like amino acids, did not show a clear improvement. This suggests that the effect of changing the modeling method depends on the category of the target compound (Fig. 3C,D, Supplemental Fig. 2).

Improved modeling of low-concentration protein by machine learning

Based on the previous results, which demonstrated the successful construction of measurement models for medium components, metabolites, cell proliferation, and product concentration, we aimed to determine whether the measurement scope could be extended further. As an example, we examined whether a measurement model for BiP protein, a marker of endoplasmic reticulum stress, could be constructed³⁶. Monitoring endoplasmic reticulum stress could be useful as it may inhibit protein production and worsen antibody quality in cell culture. The model construction conditions for PLS regression analysis were optimized following the same method as before, resulting in the construction of a Raman measurement model with an R² of 0.84 and RMSE of 0.31 (Fig. 4A). There was noticeable variation between the plots of actual and predicted values for the training data. Although there were greater deviations at the end of culture when comparing the actual (ELISA) and predicted BiP levels over time, the general consistency between the predicted and observed levels for up to 10 days of culture suggested the ability to predict the timing of increase of BiP levels (Fig. 4B). The modeling was then performed using linear regression, ridge regression, XGBoost, and neural network in a similar manner. Upon model construction with XGBoost, the deviation between actual and predicted values improved, and the assessment using the test data resulted in an R² of 0.89 and RMSE of 0.25 (Fig. 4C). The observed and predicted BiP levels over time showed overall consistency throughout the culture (Fig. 4D).

Discussion

In this study, we developed a Python program that automates the optimization of principal component numbers in the spectral domain and PLS regression for a wide range of target compounds. We used PLS regression as an example for model construction. These conditions indicate whether model accuracy increases or decreases, and Bayesian optimization proves to be a powerful technique for optimization with low computational complexity. Particularly when optimizing the model for multiple targets, automation and computational speed are crucial, making it suitable for constructing models for various components. Furthermore, we applied the same method to linear regression, ridge regression, XGBoost, and neural network, demonstrating the versatility of the optimization of PLS regression modeling. This showcases the usefulness of Python programming in selecting the optimal model-building conditions.

In this study, we utilized the Raman sampling module of a small-scale culture vessel Ambr 250 to acquire Raman spectral data. Constructing a Raman model requires a large number of data points. Conventionally, a Raman spectrometer is inserted into a glass vessel to acquire data, but this method is time-consuming, as data on only one culture can be acquired per sensor. In contrast, our approach allows the simultaneous acquisition of Raman spectral data from multiple cultures, enabling the construction of Raman models using data from a single culture. This method proves to be a simple and efficient approach for constructing Raman models, aligning the Raman spectrum with the drug development timeline, and serving as a valuable monitoring method in process development and GMP production. In this study, we constructed models using Raman spectra measured in a microfluidic channel. It was reported that comparable models can be created using this measurement technique and by directly measuring with sensors within the bioreactor²⁵. However, it is important to consider the potential for heterogeneity in the vessel, such as dissolved oxygen, when scaling up from fluid analysis³⁷. Even when using Raman sensors to measure compound concentrations, it is necessary to consider this heterogeneity. Also, for model validation, we used five reactors for model construction and the remaining one reactor as a test dataset to evaluate the predictive accuracy. This was done to clearly observe the time-course changes in the predicted data of the test dataset, as shown in Fig. 2F. However, when actually constructing models using this method, even higher accuracy models may be built by using cross-validation, where test data is randomly sampled from the data of all reactors.

PLS regression is commonly used for constructing Raman models. Datasets of Raman spectra contain numerous explanatory variables, making them suitable for regression methods that involve dimensionality reduction, such as PLS regression. PLS regression has the advantages of high speed and comprehensive model construction for each measured object. In this study, we also examined linear regression, ridge regression, XGBoost, and neural network. Some machine-learning techniques exhibited modeling performance surpassing that of PLS regression. Interestingly, the effect of improvement of model accuracy differ depending on the category of the compound. In this study, it was suggested that metabolites and vitamins had a greater impact on improving model accuracy through machine learning method selection compared to amino acids. These compounds belonged to a group with relatively low accuracy in PLS regression, and it is possible that the effect of improving accuracy is higher for compounds with low accuracy in PLS regression. To demonstrate the improvement in model accuracy, it was demonstrated that methods other than PLS regression, such as XGBoost, can improve modeling accuracy in certain subjects, as shown by the BiP levels in Fig. 4. PLS regression is a linear regression method that selects principal components to capture a linear relationship between predictor and response variables. It reduces multicollinearity and enables accurate models for multivariate data. XGBoost, on the other hand, is a non-linear algorithm that combines decision trees to capture complex patterns. It evaluates feature importance and employs ensemble learning for more accurate predictions. For some categories of compound, XGBoost may outperformed PLS regression due to its ability to capture non-linear relationships, select more appropriate features, and reduce bias and variance through ensemble learning. Hyperparameter tuning was not performed for XGBoost and neural networks in this study, but performing hyperparameter tuning in advanced computational environments may lead to the construction of models that outperform PLS regression, linear regression, and ridge regression.

Through the comprehensive analysis of Raman models of various compounds, it was found that the model accuracy for amino acids was generally high, while the accuracy for vitamins was lower. This discrepancy can be explained by two possible factors. First, the Raman spectra may not detect changes when the compound concentration is too low. Second, the accuracy of the offline measurement, which detecting the compound concentration using LC–MS, may have been compromised at low concentrations, leading to lower accuracy in the Raman measurement model. To improve the accuracy of the Raman measurement model, there is a need to enhance the accuracy of offline measurements. To further improve model accuracy, it is worth considering incorporating information other than Raman spectra into the model. Previous studies have proposed models that combine the computational fluid dynamics models³⁸ or include process-related impurities and kinetics of each cultivation data³⁹, suggesting that combining this information with Raman spectral data may lead to even higher accuracy models. Additionally, improving the model construction methods is expected to further enhance model accuracy. Narayanan et al. proposed a model construction method that combines Kalman filter⁴⁰, while Poth et al. comprehensively validate algorithms other than those used in this study⁴¹. It is believed that by extending the model construction methods as reported in these studies, the accuracy of models for compounds with lower accuracy might be further improved. Furthermore, using various variable selection methods in addition to Bayesian optimization, as discussed in the Introduction, may also have the potential to improve accuracy. In this study, the specificity of the measurement was confirmed for glucose by observing the concentration increase upon the addition of a glucose solution. Ideally, an addition experiment should be performed for each compound to confirm specificity. However, by constructing a model for data from multiple cultures with different profiles, specificity can be exhibited. In this study, data from six culture vessels with distinct profiles were used, and the measurement models of each culture vessel were constructed, suggesting the specificity of the measurement results.

Raman spectra primarily detect covalent bonds of compounds in solution, theoretically preventing the detection of metal ions, among others. Interestingly, through exhaustive model construction, models were built for compounds that theoretically could not be detected in Raman spectra, such as hydrogen ions, oxygen, carbon dioxide, and metal ions. Additionally, models were constructed for variables without a physical presence, such as cellular viability. Some compound levels correlated with the values to be measured, indirectly allowing the construction of measurement models (Fig. 4E). For instance, cellular viability is known to correlate with LDH⁴², suggesting the possibility of measuring cellular viability indirectly using LDH level determined with a Raman spectrometer as a proxy. This enables the measurement of not only the concentration of a specific compound but also all variables that characterize a cell culture through certain calculations. It is also possible to estimate the levels of compounds based on the spectral domain and model coefficients used for model construction, contributing to the identification of metabolites that correlate with specific parameters.

This study demonstrates the comprehensive construction of highly precise Raman models for measuring the concentrations of various compounds. This allows continuous acquisition of various culture data using a Raman spectrometer, enabling real-time monitoring and feedback control of culture conditions. While previous Raman measurements and feedback controls focused on glucose and amino acid concentrations, the exhaustive model construction approach may facilitate faster medium development by continuously optimizing a wider range of components.

This technique can be easily expanded to model factors such as omics data. By applying the method used in this study, modeling can be performed for various parameters beyond medium components and metabolites. We successfully constructed a predictive model for BiP, an endoplasmic reticulum stress-related factor, with good precision. Additionally, we constructed a model for oxidative glutathione, an oxidative stress-related factor, suggesting the potential for monitoring not only compound concentrations but also various stress markers. Raman modeling can be considered a feature extraction technique for quantifying culture characteristics, and it is highly compatible with AI-related technologies, which have seen remarkable advancements in recent years. Previous studies predicted transcriptome data from Raman spectra⁴³, providing a foundation for predicting multivariate or numerical values. With these technologies, we can develop more comprehensive and accurate models for a broad range of parameters.

Methods

Cell substrates and culture methods used

Fed-batch cultures were performed on three clones expressing IgG from serum-free, floating cells derived from CHO-K1 (CCL-61; ATCC, Manassas, VA, USA)^44,45 in a custom medium (chemically defined) using a 250 mL miniaturized bioreactor (Ambr250; Sartorius, Göttingen, Germany). Cultures were grown at 37 °C, 400 rpm, and maintained below 50% air saturation of dissolved oxygen with a pH of 7.2 (controlled by CO₂ sparging) for 14 days. Two replicates were cultured for each clone. Cell concentration, viability, metabolites, and antibody levels were monitored over time during culture. Cell density and viability were determined using Vi-CELL (Beckman Coulter). Metabolites were analyzed using Bio Profile FLEX2 (Nova Biomedical, Waltham, MA, USA). Antibody levels were analyzed by high-performance liquid chromatography (HPLC) with a Protein A affinity column (Agilent Technologies, Santa Clara, CA, USA) using a PA ID sensor cartridge Φ2.1 mm × 30 mm (ThermoFisher Scientific, Waltham, MA, USA). Antibody levels were described as titers. Decellularized culture supernatants were stored at − 20 °C and subjected to medium composition analyses by LC–MS and protein-concentration determination by ELISA.

Raman spectral data acquisition method

During cultivation in Ambr250, 160 µL of the culture broth was sampled and Raman spectra were acquired using a Raman Rxn2 analyzer (Endress Hauser, Reinach, Switzerland). A laser at 785 nm was applied in the flow cell to acquire spectral data ranging in wavenumber from 150 to 3425 cm⁻¹. The measurements were performed 10 times for 20 s.

LC–MS

To perform deproteinization, 60 µL of acetonitrile was added to 40 µL of the culture supernatant. The mixture was vortexed and centrifuged at 10,000 rpm for 15 min. The supernatant (50 µL) after centrifugation was diluted with 450 µL of ultrapure water, and 1 µL was subjected to LC–MS analysis. The Nexera System (Shimadzu, Osaka, Japan) was used for HPLC, and LCMS-8040 (Shimadzu) was used as the mass spectrometer. Acetonitrile was used as the mobile phase. The analytical column used was Discovery HS F5 (2.1 mm × 150 mm, 3 µm) (Sigma-Aldrich, St. Louis, MO, USA), and the mobile phases used were 0.1% formic acid–water and 0.1% acetonitrile. Compound identification and quantitation were performed using the LC/MS/MS method package cell-culture profiling (Shimadzu) with data reported as relative concentrations. The acquired data were standardized and used as objective variables for model construction.

ELISA

BiP levels were measured by ELISA using the GRP78/BiP ELISA Kit (#AD1-900-214; Enzo Life Sciences, Inc., Farmingdale, NY, USA). Primary antibodies (50 µL) were added to 100 µL of the culture supernatant and gently shaken for 60 min at room temperature. Subsequently, 50 µL of secondary antibody was added to the post-reaction solutions and gently shaken for 60 min at room temperature. The reactants were discarded, washed at least three times with wash buffer, and 200 µL of TMB solution was added and shaken for 30 min to develop color. Finally, 50 µL of stop solution was added to stop the reaction, and the absorbance at 450 nm was measured. BiP levels were quantified from the calibration curves measured using standard solutions.

Model building

All the calculations were performed on a Linux server having a dual Intel® Xeon® E5-2667 v4 processor (3.20 GHz), 125 GB RAM, with Ubuntu 18.04.1 LTS operating system. Python 3.8.8 was used to build the model. Data frames were generated with Raman spectral data as explanatory variables and culture profile data as objective variables. For the Raman spectral data, the minimum wavenumber was set as 100 cm⁻¹, and the maximum was set as 3425 cm⁻¹. The spectral range was defined with a minimum value of 125 cm⁻¹ and a maximum value of 3325 cm⁻¹. Among the 78 sets of data, data acquired from five reactors were used as training data, and data acquired from one of the remaining reactors were used as test data. The data to be measured were standardized, and for Raman spectrum data, a dataset was created by preprocessing the data in the following order: untreated, standardized, smoothed (moving average of 10 points), first-order differential, and second-order differential. Machine learning algorithms such as PLS regression, linear regression, ridge regression, XGBoost, and neural network were used with methods from the scikit-learn library (version 0.24.1) for PLSRegression, LinearRegression, Ridge, and MLPRegressor, and methods from the XGBoost library (version 1.7.1) for XGBRegressor. For the hyperparameters of MLPRegressor and XGBRegressor, the defaults of scikit-learn were used. Bayesian optimization was performed using the GPyOpt package (version 1.2.6) with up to 20 attempts. Model performance was assessed using R² and RMSE as indicators⁴⁶. R² measures the proportion of the variance in the dependent variable that can be explained by the independent variables. RMSE calculates the average deviation between predicted and actual values, providing an overall measure of accuracy. Each formula is shown below, where ${y}_{i}$ represents the actual value, ${\widehat{y}}_{i}$ represents the predicted value and ${\overline{y} }_{i}$ represents the average of actual values:

$${R}^{2}= 1-\frac{{\sum }_{i=1}^{m}{({y}_{i}-{\widehat{y}}_{i})}^{2}}{{\sum }_{i=1}^{m}{({y}_{i}-{\overline{y} }_{i})}^{2}}$$

$$RMSE= \sqrt{\frac{1}{m}{\sum }_{i=1}^{m}{({y}_{i}-{\widehat{y}}_{i})}^{2}}$$

Data availability

The data that support the findings of this study are available from the corresponding author upon reasonable request. Additional data are available in the supplementary material of this article.

References

Gauzy-Lazo, L., Sassoon, I. & Brun, M. P. Advances in antibody-drug conjugate design: Current clinical landscape and future innovations. SLAS Discov. 25, 843–868. https://doi.org/10.1177/2472555220912955 (2020).
Article PubMed CAS Google Scholar
Kunert, R. & Reinhart, D. Advances in recombinant antibody manufacturing. Appl. Microbiol. Biotechnol. 100, 3451–3461. https://doi.org/10.1007/s00253-016-7388-9 (2016).
Article PubMed PubMed Central CAS Google Scholar
Bielser, J. M., Wolf, M., Souquet, J., Broly, H. & Morbidelli, M. Perfusion mammalian cell culture for recombinant protein manufacturing: A critical review. Biotechnol. Adv. 36, 1328–1340. https://doi.org/10.1016/j.biotechadv.2018.04.011 (2018).
Article PubMed CAS Google Scholar
Ritacco, F. V., Wu, Y. & Khetan, A. Cell culture media for recombinant protein expression in Chinese hamster ovary (CHO) cells: History, key components, and optimization strategies. Biotechnol. Prog. 34, 1407–1426. https://doi.org/10.1002/btpr.2706 (2018).
Article PubMed CAS Google Scholar
Prashad, K. & Mehra, S. Dynamics of unfolded protein response in recombinant CHO cells. Cytotechnology 67, 237–254. https://doi.org/10.1007/s10616-013-9678-8 (2015).
Article PubMed CAS Google Scholar
Tanemura, H. et al. Development of a stable antibody production system utilizing an Hspa5 promoter in CHO cells. Sci. Rep. 12, 7239. https://doi.org/10.1038/s41598-022-11342-1 (2022).
Article ADS PubMed PubMed Central CAS Google Scholar
Gillespie, C. et al. Systematic assessment of process analytical technologies for biologics. Biotechnol. Bioeng. 119, 423–434. https://doi.org/10.1002/bit.27990 (2022).
Article PubMed CAS Google Scholar
Yilmaz, D. et al. Application of Raman spectroscopy in monoclonal antibody producing continuous systems for downstream process intensification. Biotechnol. Prog. 36, e2947. https://doi.org/10.1002/btpr.2947 (2020).
Article PubMed CAS Google Scholar
Abu-Absi, N. R. et al. Real time monitoring of multiple parameters in mammalian cell culture bioreactors using an in-line Raman spectroscopy probe. Biotechnol. Bioeng. 108, 1215–1221. https://doi.org/10.1002/bit.23023 (2011).
Article PubMed CAS Google Scholar
Whelan, J., Craven, S. & Glennon, B. In situ Raman spectroscopy for simultaneous monitoring of multiple process parameters in mammalian cell culture bioreactors. Biotechnol. Prog. 28, 1355–1362. https://doi.org/10.1002/btpr.1590 (2012).
Article PubMed CAS Google Scholar
Matuszczyk, J. C. et al. Raman spectroscopy provides valuable process insights for cell-derived and cellular products. Curr. Opin. Biotechnol. 81, 102937. https://doi.org/10.1016/j.copbio.2023.102937 (2023).
Article PubMed CAS Google Scholar
Berry, B., Moretto, J., Matthews, T., Smelko, J. & Wiltberger, K. Cross-scale predictive modeling of CHO cell culture growth and metabolites using Raman spectroscopy and multivariate analysis. Biotechnol. Prog. 31, 566–577. https://doi.org/10.1002/btpr.2035 (2015).
Article PubMed CAS Google Scholar
Yousefi-Darani, A. et al. Generic chemometric models for metabolite concentration prediction based on Raman spectra. Sensors https://doi.org/10.3390/s22155581 (2022).
Article PubMed PubMed Central Google Scholar
Matthews, T. E. et al. Closed loop control of lactate concentration in mammalian cell culture by Raman spectroscopy leads to improved cell density, viability, and biopharmaceutical protein production. Biotechnol. Bioeng. 113, 2416–2424. https://doi.org/10.1002/bit.26018 (2016).
Article PubMed CAS Google Scholar
Domján, J. et al. Raman-based dynamic feeding strategies using real-time glucose concentration monitoring system during adalimumab producing CHO cell cultivation. Biotechnol. Prog. 36, e3052. https://doi.org/10.1002/btpr.3052 (2020).
Article PubMed CAS Google Scholar
Webster, T. A. et al. Feedback control of two supplemental feeds during fed-batch culture on a platform process using inline Raman models for glucose and phenylalanine concentration. Bioprocess Biosyst. Eng. 44, 127–140. https://doi.org/10.1007/s00449-020-02429-y (2021).
Article PubMed CAS Google Scholar
Domján, J. et al. Real-time amino acid and glucose monitoring system for the automatic control of nutrient feeding in CHO cell culture using Raman spectroscopy. Biotechnol. J. 17, e2100395. https://doi.org/10.1002/biot.202100395 (2022).
Article PubMed CAS Google Scholar
Kozma, B. et al. On-line prediction of the glucose concentration of CHO cell cultivations by NIR and Raman spectroscopy: Comparative scalability test with a shake flask model system. J. Pharm. Biomed. Anal. 145, 346–355. https://doi.org/10.1016/j.jpba.2017.06.070 (2017).
Article PubMed CAS Google Scholar
Rafferty, C. et al. Raman spectroscopy as a method to replace off-line pH during mammalian cell culture processes. Biotechnol. Bioeng. 117, 146–156. https://doi.org/10.1002/bit.27197 (2020).
Article PubMed CAS Google Scholar
Eyster, T. W. et al. Tuning monoclonal antibody galactosylation using Raman spectroscopy-controlled lactic acid feeding. Biotechnol. Prog. 37, e3085. https://doi.org/10.1002/btpr.3085 (2021).
Article CAS Google Scholar
Romann, P. et al. Advancing Raman model calibration for perfusion bioprocesses using spiked harvest libraries. Biotechnol. J. 17, e2200184. https://doi.org/10.1002/biot.202200184 (2022).
Article PubMed CAS Google Scholar
Graf, A. et al. A novel approach for non-invasive continuous in-line control of perfusion cell cultivations by Raman spectroscopy. Front. Bioeng. Biotechnol. 10, 719614. https://doi.org/10.3389/fbioe.2022.719614 (2022).
Article PubMed PubMed Central CAS Google Scholar
Wei, B. et al. Multi-attribute Raman spectroscopy (MARS) for monitoring product quality attributes in formulated monoclonal antibody therapeutics. MAbs 14, 2007564. https://doi.org/10.1080/19420862.2021.2007564 (2022).
Article PubMed CAS Google Scholar
Gibbons, L. A. et al. Raman based chemometric model development for glycation and glycosylation real time monitoring in a manufacturing scale CHO cell bioreactor process. Biotechnol. Prog. 38, e3223. https://doi.org/10.1002/btpr.3223 (2022).
Article CAS Google Scholar
Rowland-Jones, R. C. et al. Spectroscopy integration to miniature bioreactors and large scale production bioreactors-Increasing current capabilities and model transfer. Biotechnol. Prog. 37, e3074. https://doi.org/10.1002/btpr.3074 (2021).
Article PubMed CAS Google Scholar
Graf, A. et al. Automated data generation for Raman spectroscopy calibrations in multi-parallel mini bioreactors. Sensors https://doi.org/10.3390/s22093397 (2022).
Article PubMed PubMed Central Google Scholar
Xiaobo, Z., Jiewen, Z., Povey, M. J., Holmes, M. & Hanpin, M. Variables selection methods in near-infrared spectroscopy. Anal. Chim. Acta 667, 14–32. https://doi.org/10.1016/j.aca.2010.03.048 (2010).
Article PubMed CAS Google Scholar
Rammal, A., Perrin, E., Vrabie, V., Assaf, R. & Fenniri, H. Selection of discriminant mid-infrared wavenumbers by combining a naïve Bayesian classifier and a genetic algorithm: Application to the evaluation of lignocellulosic biomass biodegradation. Math. Biosci. 289, 153–161. https://doi.org/10.1016/j.mbs.2017.05.002 (2017).
Article MathSciNet PubMed MATH CAS Google Scholar
Devos, O. & Duponchel, L. Parallel genetic algorithm co-optimization of spectral pre-processing and wavelength selection for PLS regression. Chemometr. Intell. Lab. Syst. 107, 50–58. https://doi.org/10.1016/j.chemolab.2011.01.008 (2011).
Article CAS Google Scholar
Maruthamuthu, M. K., Raffiee, A. H., De Oliveira, D. M., Ardekani, A. M. & Verma, M. S. Raman spectra-based deep learning: A tool to identify microbial contamination. Microbiol. Open 9, e1122. https://doi.org/10.1002/mbo3.1122 (2020).
Article CAS Google Scholar
Tulsyan, A. et al. A machine-learning approach to calibrate generic Raman models for real-time monitoring of cell culture processes. Biotechnol. Bioeng. 116, 2575–2586. https://doi.org/10.1002/bit.27100 (2019).
Article PubMed CAS Google Scholar
Mo, W. et al. Classification of coronavirus spike proteins by deep-learning-based Raman spectroscopy and its interpretative analysis. J. Appl. Spectrosc. 89, 1203–1211. https://doi.org/10.1007/s10812-023-01487-w (2023).
Article ADS PubMed PubMed Central CAS Google Scholar
Liu, W. et al. Determination of benzo(a)pyrene in peanut oil based on Raman spectroscopy and machine learning methods. Spectrochim. Acta A. 299, 122806. https://doi.org/10.1016/j.saa.2023.122806 (2023).
Article CAS Google Scholar
Li, J. Q., Dukes, P. V., Lee, W., Sarkis, M. & Vo-Dinh, T. Machine learning using convolutional neural networks for SERS analysis of biomarkers in medical diagnostics. J. Raman Spectrosc. 53, 2044–2057. https://doi.org/10.1002/jrs.6447 (2022).
Article ADS PubMed PubMed Central CAS Google Scholar
Shahriari, B., Swersky, K., Wang, Z., Adams, R. P. & Freitas, N. D. Taking the human out of the loop: A review of Bayesian optimization. Proc. IEEE 104, 148–175. https://doi.org/10.1109/JPROC.2015.2494218 (2016).
Article Google Scholar
Wang, J., Lee, J., Liem, D. & Ping, P. HSPA5 gene encoding Hsp70 chaperone BiP in the endoplasmic reticulum. Gene 618, 14–23. https://doi.org/10.1016/j.gene.2017.03.005 (2017).
Article PubMed PubMed Central CAS Google Scholar
Rahimi, M. J., Sitaraman, H., Humbird, D. & Stickel, J. J. Computational fluid dynamics study of full-scale aerobic bioreactors: Evaluation of gas–liquid mass transfer, oxygen uptake, and dynamic oxygen distribution. Chem. Eng. Res. Des. 139, 283–295. https://doi.org/10.1016/j.cherd.2018.08.033 (2018).
Article CAS Google Scholar
Farzan, P. & Ierapetritou, M. G. A framework for the development of integrated and computationally feasible models of large-scale mammalian cell bioreactors. Processes 6, 82 (2018).
Article Google Scholar
Okamura, K., Badr, S., Murakami, S. & Sugiyama, H. Hybrid modeling of CHO cell cultivation in monoclonal antibody production with an impurity generation module. Ind. Eng. Chem. Res. 61, 14898–14909. https://doi.org/10.1021/acs.iecr.2c00736 (2022).
Article CAS Google Scholar
Narayanan, H. et al. Hybrid-EKF: Hybrid model coupled with extended Kalman filter for real-time monitoring and control of mammalian cell culture. Biotechnol. Bioeng. 117, 2703–2714. https://doi.org/10.1002/bit.27437 (2020).
Article PubMed CAS Google Scholar
Poth, M., Magill, G., Filgertshofer, A., Popp, O. & Großkopf, T. Extensive evaluation of machine learning models and data preprocessings for Raman modeling in bioprocessing. J. Raman Spectrosc. 53, 1580–1591. https://doi.org/10.1002/jrs.6402 (2022).
Article ADS CAS Google Scholar
Fu, T. et al. Regulation of cell growth and apoptosis through lactate dehydrogenase C over-expression in Chinese hamster ovary cells. Appl. Microbiol. Biotechnol. 100, 5007–5016. https://doi.org/10.1007/s00253-016-7348-4 (2016).
Article PubMed CAS Google Scholar
Kobayashi-Kirschvink, K. J. et al. Linear regression links transcriptomic data and cellular Raman spectra. Cell Syst. 7, 104-117.e104. https://doi.org/10.1016/j.cels.2018.05.015 (2018).
Article PubMed CAS Google Scholar
Okumura, T. et al. Efficient enrichment of high-producing recombinant Chinese hamster ovary cells for monoclonal antibody by flow cytometry. J. Biosci. Bioeng. 120, 340–346. https://doi.org/10.1016/j.jbiosc.2015.01.007 (2015).
Article PubMed CAS Google Scholar
Masuda, K. et al. Novel cell line development strategy for monoclonal antibody manufacturing using translational enhancing technology. J. Biosci. Bioeng. 133(3), 273–280. https://doi.org/10.1016/j.jbiosc.2021.11.010 (2022).
Article PubMed CAS Google Scholar
Karunasingha, D. S. K. Root mean square error or mean absolute error? Use their ratio as well. Inf. Sci. 585, 609–629. https://doi.org/10.1016/j.ins.2021.11.036 (2022).
Article Google Scholar

Download references

Author information

Authors and Affiliations

Biologics Technology Research Laboratories I, Biologics Division, Daiichi Sankyo Co., Ltd., 2716-1, Aza Kurakake, Oaza Akaiwa, Chiyoda-Machi, Oura-Gun, Gunma, 370-0503, Japan
Hiroki Tanemura, Ryunosuke Kitamura, Masato Hoshino & Hirofumi Kakihara
Analytical & Quality Evaluation Research Laboratories, Pharmaceutical Technology Division, Daiichi Sankyo Co., Ltd., 1-12-1, Shinomiya, Hiratsuka, Kanagawa, 254-0014, Japan
Yasuko Yamada
Biologics Division, Daiichi Sankyo Co., Ltd., 2716-1, Aza Kurakake, Oaza Akaiwa, Chiyoda-Machi, Oura-Gun, Gunma, 370-0503, Japan
Koichi Nonaka

Authors

Hiroki Tanemura
View author publications
You can also search for this author in PubMed Google Scholar
Ryunosuke Kitamura
View author publications
You can also search for this author in PubMed Google Scholar
Yasuko Yamada
View author publications
You can also search for this author in PubMed Google Scholar
Masato Hoshino
View author publications
You can also search for this author in PubMed Google Scholar
Hirofumi Kakihara
View author publications
You can also search for this author in PubMed Google Scholar
Koichi Nonaka
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

H.T. conceived and conducted the experiment, and prepared the manuscript. R.K. supported data analysis and reviewed the manuscript. Y.Y. supported data analysis and reviewed the manuscript. M.H. reviewed the manuscript. H.K. reviewed the manuscript. K.N. reviewed the manuscript.

Corresponding author

Correspondence to Hiroki Tanemura.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Figures.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Tanemura, H., Kitamura, R., Yamada, Y. et al. Comprehensive modeling of cell culture profile using Raman spectroscopy and machine learning. Sci Rep 13, 21805 (2023). https://doi.org/10.1038/s41598-023-49257-0

Download citation

Received: 18 August 2023
Accepted: 06 December 2023
Published: 09 December 2023
DOI: https://doi.org/10.1038/s41598-023-49257-0

This article is cited by

Convolutional Neural Networks Guided Raman Spectroscopy as a Process Analytical Technology (PAT) Tool for Monitoring and Simultaneous Prediction of Monoclonal Antibody Charge Variants
- Nitika Nitika
- B. Keerthiveena
- Anurag S. Rathore
Pharmaceutical Research (2024)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.