Identification, characterization and control of a sequence variant in monoclonal antibody drug product: a case study

Sequence variants (SV) in protein bio therapeutics can be categorized as unwanted impurities and may raise serious concerns in efficacy and safety of the product. Early detection of specific sequence modifications, that can result in altered physicochemical and or biological properties, is therefore desirable in product manufacturing. Because of their low abundance, and finite resolving power of conventional analytical techniques, they are often overlooked in early drug development. Here, we present a case study where trace amount of a sequence variant is identified in a monoclonal antibody (mAb) based therapeutic protein by LC–MS/MS and the structural and functional features of the SV containing mAb is assessed using appropriate analytical techniques. Further, a very sensitive selected reaction monitoring (SRM) technique is developed to quantify the SV which revealed both prominent and inconspicuous nature of the variant in process chromatography. We present the extensive characterization of a sequence variant in protein biopharmaceutical and first report on control of sequence variants to < 0.05% in final drug product by utilizing SRM based mass spectrometry method during the purification steps.

Expressing the right clone is one of the important steps in the product development of protein biotherapeutics 1 . In spite of the near absolute fidelity of DNA polymerases, single nucleotide polymorphisms are observed due to erroneous gene transcription, which results in altered amino acid sequences. The sequence alterations can also result from mistranslation or improper tRNA acylation by either nonsense read-through or misreading at the level of transcription or translation 2 . Additionally, mis-cleavage during the posttranslational processing can also lead to non-native amino acid substitutions 3 . These sequence variants in the final drug product are undesirable, as they may possess altered physicochemical and or biological properties compared to wild-type product, which therefore can affect the overall efficacy, stability or safety of the biomolecule drug. The most unwanted outcome of these substitutions are the perturbations in tertiary structure of the protein leading to formation of new conformational epitopes which might elicit varying levels of unwanted immune responses. The safety consequences of immune responses to therapeutic protein products are generally unpredictable and can range from no apparent effect to serious adverse events depending on immune tolerance of the patient to that particular therapeutic protein. Recent survey conducted by International Consortium for Innovation & Quality in Pharmaceutical Development (IQ) demonstrated that biopharmaceutical industry has SV workflows incorporated in their early development with appropriate mitigation strategy to counteract specific mis-incorporation mechanisms at the genetic, translation, and cellular levels 4 . The survey also reported that several organizations discard cell lines with > 1% SV and understand that hard limits on SV is not practical and a cell line with SV can be used for further product development if adequate risk assessment for the criticality of its low abundant presence in the mAb drug product has been performed. The US Food and Drug Administration (US-FDA) guidelines recommends www.nature.com/scientificreports/ mis-incorporations as change of cell line may be required while latter can be addressed by media optimization 4 . Depending on the stage of the development, this approach may incur a moderate to significant delay in reaching the drug to patients. Alternatively, the impact of very low levels sequence variant in a functionally inactive region of the protein can be nullified theoretically 9 and the development can move forward. Although this approach avoids any delay in the program, it comes with a bigger risk-possibility of failing in the immunogenicity during the clinical trial. Here we report a third approach, where the physicochemical and the functional properties of a glutamic acid (E) to lysine (K) sequence variant, identified by LC-MS/MS in end of fermentation product during initial development of a monoclonal antibody based therapeutic, is studied thoroughly by an array of analytical techniques and additional process steps and highly sensitive analytical methods are implemented to make sure that the sequence variant containing version is efficiently controlled in the product. Multiple batches of drug products containing less than 0.04% sequence variant were thus manufactured using this approach. In this particular case, this approach not only avoided the delay due to starting again with a new clone but also mitigated the risk of failure in the clinical trial stage. Similar control strategy can be adopted for undesirable sequence variants using their unique physicochemical property.

Results
Early detection of the sequence variant. The monoclonal antibodies (mAbs) undergo different chemical and enzymatic post-translational modifications (PTM). Although LC-MS/MS based peptide map analysis in high resolution mass spectrometers (HRMS) coupled with software driven search options during data analysis is a powerful tool to detect the PTMs and inherent modifications such as SVs, modifications present in very minute amounts (< 1%) may evade the software driven search due to the lack of definitive MS and/or MS/MS signals. Some of these modifications result in differences in the pI of the protein and subsequently lead to the acidic (lower pI) and basic (higher pI) variants of the mAb. These charge variants are separated by ion-exchange chromatography (IEX) and characterized to understand the nature of the PTM. The probability of identifying the PTMs and other variants are enhanced in the purified charged variants due to the enrichment of the modifications in these fractions. The Glutamic acid (E) to Lysine (K) sequence variant described here was first identified during the charge variant characterization of a far basic variant (FBV) in the end-of-fermentation product (EOF) of a monoclonal antibody (mAb X) (Fig. 1a). The protein A purified mAb X was fractionated through CEX and fractions enriched in FBV and the main variant (MV) was analyzed side by side extensively by mass spectrometry to understand the modification present in FBV. The intact and sub-unit (heavy chain and light chain) mass of the charge variant FBV are compared with the main variant (MV) in Table S1 (supplementary material). The main variant deconvolutes to an intact mass of 148,082 Da comprising of two light and two heavy chains with dominant glycoform G0F (termed as G0F/G0F). Additionally, trace amounts of other glycoforms (G1F and G2F) were also observed. The same species were  C-terminal lysine variants are known modifications in monoclonal antibodies that add positive charge to the net surface charge of the molecule imparting basic nature to the antibody 42 . The extracellular carboxypeptidase in mammalian expression systems generally clips off the C-terminal lysine at the heavy chain, the unprocessed anti-bodies appear as basic variants in the purified pool and add to antibody heterogeneity 43 . However, in mAb X cation exchange profile, the lysine variants (G0F/G0F + K and G0F/G0F + 2 K) elutes just after the main variant (peaks B1, B2, B3) and much before the far basic variant B5 (Fig. 1a). Thus, the far basic nature of charge variant FBV cannot be explained by the presence of lysine at the C terminus of HC alone and therefore needed further investigation.
Peptide mass fingerprinting (PMF) is a powerful technique for characterizing the primary structure of proteins including its amino acid sequence and posttranslational modifications (PTMs) 44 . For complete sequence coverage, complementary enzymes are used to generate peptides, which can be separated on reversed phasehigh or ultra-performance liquid chromatography (RP HPLC/UPLC) and detected with UV detector 45 . The separated peptides are then investigated for amino acid sequence and PTMs using an accurate, high-resolution and sensitive mass spectrometer.
The enriched main and far basic variants of mAb X were digested by trypsin and the peptides thus generated was separated by liquid chromatography (LC) using a 120 min long gradient of 0.09% TFA in 90:10 acetonitrile: water. The separated peptides were detected by UV detector and then identified by mass spectrometer (Orbitrap LTQ) coupled to the LC outlet. Figure 1b presents a part of the PMF-UV profile overlay of mAb X charge variants MV and FBV. The UV profile overlay was comparable for all the peaks observed except an extra signal observed at 57.39 min in FBV (Fig. 1b). The mass spectrometry (MS) profile of this extra UV signal revealed monoisotopic mass at m/z 906.45 (z = 2), which was not present in MV (Fig. 1c). The single charged (z = 1) m/z of 1811.88 was also observed, however, z = 2 was the dominant charge state. Furthermore, MS/MS analysis identified the sequence of the peptide as VTCVVVDVSHEDPEVK (Fig. 2). This peptide appeared to be truncated part of the heavy chain tryptic peptide TPE 262 VTCVVVDVSHEDPEVK 278 eluting at ~ 72 min (m/z 1070.02, z = 2) in both FBV and MV (refer Fig. 3b). Since the amino acid preceding V 263 is E 262 , trypsin should not cleave at that site. One possibility is that the FBV contains a shorter version of the mAb X, truncated at E 262 . Truncation at heavy chain E 262 site of mAb X will result in a protein with mass of 22,759 Da (with G0F mass), which could be easily detected by intact and sub-unit mass analysis. Further, the fragmented protein will be detected by other impurity identification techniques such as size exclusion chromatography (SEC) or CE-SDS. However, the truncated protein was not identified in FBV by intact and reduced mass analysis (Table S1) and by SEC or CE-SDS analysis (data not shown). This led to the hypothesis that, some population of the secreted mAb X is expressing K or R at the 262 amino acid position of heavy chain, instead of E, and thus presenting an additional cleavage site for trypsin, subsequently resulting in a shorter peptide V 263 TCVVVDVSHEDPEVK 278 instead of the expected peptide T 260 PE 262 VTCVVVDVSHEDPEVK 278 (Fig. 3a) during trypsin digested peptide map analysis. However, the other part of the peptide (T 260 PK 262 or T 260 PR 262 ) was not detected in this experiment, mostly because of the small size of it. Thus, the actual substitution (E to K or E to R) could not be confirmed from this experiment.  www.nature.com/scientificreports/ Nevertheless, E to R substitution would lead to a mass difference of 27 Da in heavy chain mass, which can be detected by intact and reduced mass analysis. On the other hand, E to K substitution would lead to only 1 Da of mass difference and is not expected to be detected by intact and reduced mass analysis. Thus, no mass difference (apart from lysine variants) observed in FBV during intact and sub-unit mass analysis (Table S1) indirectly indicates the presence of E262K substitution. This hypothesis was further verified by Glu-C digested peptide map, as described below. The extracted ion chromatograms of native peptide T 260 PE 262 VTCVVVDVSHEDPEVK 278 and truncated peptide V 263 TCVVVDVSHEDPEVK 278 in FBV and MV are shown in Fig. 3b, indicating the presence of both the peptides in FBV and only the native peptide in MV. The partial purity of FBV could lead to the presence of wildtype mAb X in FBV, subsequently generating the native peptide. Additionally, it is also plausible that E262K/R mutation is present only in one heavy chain of the E262K/R substituted mAb X, thus generating both native and substituted peptides during the PMF analysis of FBV. The presence of this truncated peptide was searched by extracted ion chromatogram in all the enriched charge variants of mAb X and it was found to be unique to FBV.
The E262K/R substitution was further confirmed using Glu-C enzymatic digestion of enriched FBV. Fig. S1a shows schematic of Glu-C digested wild-type and substituted mAb X. The native sequence L 237 LGGPSVFLFPPKPKDTLMISRTPE 262 VTCVVVDVSHEDPE 276 would generate L 237 LGGPSVFLFPPKPKDTLMISRTPE 262 and V 263 TCVVVDVSHEDPE 276 as fragments post Glu-C digestion (in bicarbonate buffer), while the E262K or E262R substituted peptide would not undergo digestion at 262 site and appear as L 237 LGGPSVFLFPPKPKDTLMISRTPK 262 VTCVVVDVSHEDPE 276 or L 237 LGGPSVFLFPPKPKDTLMISRTPR 262 VTCVVVDVSHEDPE 276 . The masses corresponding to these peptides were searched, through extracted ion chromatogram (EIC), in the mass spectrometry data from the Glu-C digested peptide map of main variant and far basic variant of mAb X. Among these, the mass corresponding to peptides L 237 LGGPSVFLFPPKPKDTLMISRTPE 262 and V 263 TCVVVDVSHEDPE 276 was observed in MV and FBV, while the mass corresponding to L 237 LGGPSVFLFPPKPKDTLMISRTPK 262 VTCVVVDVSHEDPE 276 was detected in FBV (m/z = 1435.75, z = 3 and m/z = 1077.07, z = 4) (Fig. S1b)    Origin of the E to K substitution. Single nucleotide polymorphism (SNP) in the genomic DNA is one of the most common origin of sequence variant in the resultant protein 7,28 . In order to detect the SNP at the genomic level leading to E262K substitution, Cast-PCR (Competitive allele-specific Taqman qPCR) technique 46 was employed. The technique utilizes an allele specific primer for somatic mutant allele detection that competes with an MGB blocker oligonucleotide to suppress the predominant wild-type background thus allowing 1:1000 (mutant: wild type allele) sensitivity. The amino acid E262K is possible only when the triplet codon 'gag' changed to 'aag' and therefore, primers were designed accordingly. In brief, the genomic DNA extracted from mAb X clone was analyzed by qPCR using primers specific to wild type (atgatctcccggacccctgaggtcacatgcgtggtggtggacgtg) and primer specific to sequence variant (atgatctcccggacccctaaggtcacatgcgtggtggtggacgtg). The amplification was observed using both the primers (Fig. 3c), indicating the presence of the SNP specific to base change from 'g' to 'a' which lead to E262K at protein level. Moreover, the relative abundance of the SNP was estimated from the cycle threshold (Ct) of the PCR reactions and was found to be ~ 1%. The presence of this SNP was further confirmed through next generation sequencing (NGS) by using both Illumina and Ion-Torrent platforms (data not shown).
Characterization of the SV containing mAb X (mAb X'). The structural and functional features of the modified (E to K substituted at 262 position in heavy chain) mAb X was assessed by several physicochemical and in-vitro bioassay techniques. This study was conducted to understand the structural and functional differences in the SV containing mAb X (called as mAb X' from here on), compared to the native mAb X. Different lots of mAb X may have small differences in product related variants, due to the complex process involved in mAb manufacturing. Also, the inherent variabilities present in the analytical techniques used may also lead to small differences in the variant contents in different lots of mAb X. Thus, data from multiple lots of mAb X (manufactured in-house and sourced from external agencies) was utilized to obtain a range of data for mAb X and the data generated for mAb X' was compared against that range. Nevertheless, to assess the presence of new impurities/variants or to understand the profile differences in case of peptide map and higher order structure methods, three mAb X lots were analyzed side by side with the mAb X' lot.
The mAb X' was enriched and purified from mAb X by cation exchange chromatography and the purity (~ 98%) was confirmed by analytical cation exchange chromatography (Fig. 4a). The second peak observed in purified mAb X' was found to be lysine variant (discussed below). Post purification, mAb X' was buffer exchanged to the mAbX formulation buffer and stored appropriately.
The results obtained from the characterization of mAb X' is summarized in Table 1. The primary structure of the mAb X' and mAb X was compared by intact and sub-unit mass analysis and amino acid sequencing by LC-MS. The intact mass of mAb X' and mAb X was similar and same heavy chain and light chain mass was observed for these two proteins as well (Table S1, Supplementary material). Apart from the extra tryptic peptide (V 263 TCVVVDVSHEDPEVK 278 ) due to the E262K substitution in mAb X' , no other difference was detected in the amino acid sequence of mAb X' and mAb X. Although the mAb X' was not contaminated with mAb X (Fig. 4a), a significant amount of native peptide (T 260 PE 262 VTCVVVDVSHEDPEVK 278 ) was detected in the tryptic peptide map mAb X' (Fig. 4b). This indicates that the E262K substitution is present in only one of the heavy chains of mAb X' , while the other chain is unmodified. Thus, during reduction, mAb X' generates equal amounts of native and modified heavy chains (Fig. 4c) and produces almost equal amounts of native and truncated peptides, post trypsin digestion.
The disulfide links in mAb X' and mAb X was assessed by non-reduced Lys-C digested peptide map LC-MS and all the eight disulfide links were found to be conserved in both the proteins. Two extra peaks were observed in the non-reduced peptide map profile of mAb X' (Fig. S2, supplementary material), compared to the mAb X, due to the extra Lys-C digestion site in mAb X' resulting from the E262K sequence variant. The overall secondary structure of these two antibodies was tested by far-UV CD (circular dichroism) and FT-IR (Fourier-transform infrared) spectroscopy. The far-UV CD profile of mAb X' was similar to the profiles of mAb X lots analyzed side by side, while the contribution from different secondary structure elements determined by FTIR was also similar between mAb X' and mAb X. Similarly, no difference was observed between the near-UV CD profiles of mAb X' and mAb X lots, indicating similar tertiary structures in these two products. The melting temperatures obtained from the differential scanning calorimetry (DSC) studies also indicated similar unfolding patterns in mAb X and mAb X' .
The aggregate content in mAb X' was very low and the low molecular weight impurities, measured by nonreduced CE-SDS, was similar to the mAb X. mAb X is IgG1 and is Fc glycosylated. N-glycan profiles of both mAb X and mAb X' was also compared and found similar. The pI of mAb X' was more basic than mAb X due to the substitution of acidic E with basic K, and the same was evident in the pI variant analysis by imaged capillary isoelectric focusing (iCE) (Fig. 5a,b). mAb X' showed three peaks: the first minor peak aligned with main peak of mAb X and two major peaks aligned with basic peaks B1 and B2 of mAb X; the more basic peak disappeared www.nature.com/scientificreports/ post carboxypeptidase B (CPB) treatment. Notably, the basicity of mAb X' relative to mAb X in iCE analysis was not as much as seen in cation exchange chromatography. As mentioned earlier mAb X' eluted as two peaks in cation exchange chromatography (CEX) (Fig. 4a). The second peak of the two also disappeared after CPB treatment ( Fig. S3 supplementary material), indicating that the second peak is lysine variant of mAb X' . Hydrophobic interaction chromatography (HIC) separates variants in order of increasing hydrophobicity and works orthogonal to SEC and CEX separation in principle. The HIC profile of mAb X shows four peaks where peak 3 is the main peak; peak 1 and 2 correspond to basic variants in CEX profile (data not shown). Earlier published work by John Douglass et al. also reported lysine variants as early HIC peaks 47 . Interestingly, the mAb X' eluted slightly earlier than mAb X in HIC analysis (Fig. 5c), indicating that mAb X' is slightly more hydrophilic than mAb X. Since E to K substitution should not enhance the hydrophilicity of the protein (actually E is slightly more hydrophilic than K), increased hydrophilicity in mAb X' is likely to be caused by the slight structural variation of the molecule which either makes the molecule more compact making the hydrophobic residues less accessible or makes the molecule more open making hydrophilic residues more accessible. This structural variation could also lead to some differences in charge distribution of the molecule which is detected in cation exchange chromatography. However, the cIEF is run under denaturing condition and thus was not able to detect the structural variation.
The E262K substitution in mAb X' is in the CH2 region of the antibody and thus may impact the Fc receptor binding activities of mAb X' . The fragment crystallizable γ (Fcγ) receptors and neonatal Fc receptor (FcRn) interacts with the Fc region of the mAbs and induces potent and diverse immune responses 48 . Different posttranslational modifications in mAb, such as N-glycosylation, deamidation, oxidation, are known to affect the interaction with specific Fc receptors 49,50 . The relative Fc receptor binding activities of mAb X' was assessed by Surface Plasmon Resonance (SPR) based in vitro assay, using mAb X as standard, where a relative binding potency of 0.80-1.25 is considered as similar, based on the precision of the assay. As shown in Table 1, The FcγRIa, FcγRIIIa, FcγRIIIb, FcRn and C1q binding of mAb X' was found to be similar to mAb X. On the other hand, a marginal increase was observed in FcγRIIb binding of mAb X' , and interestingly, the FcγRIIa binding potency of mAb X' was found to be considerably higher than mAb X. Since, E262 is not directly involved in FcγRIIa binding to the Fc 48 , the E262K substitution alone is not expected to impact the binding. Thus, this data also indicates the possibility of a structural alteration due to the E262K substitution in mAb X' , affecting the FcγRIIa binding to  www.nature.com/scientificreports/ the mAb. This alteration seems not to be impacting global structure and thus was not captured in higher order structure assessment techniques like CD, FT-IR and DSC, but more local in nature causing change in charge distribution and surface hydrophobicity so as to be picked up by CEX and HIC techniques, respectively.
Relative quantitation of the sequence variant by peptide mapping fingerprint and extracted ion chromatogram (PMF-EIC). The E262K modification identified in mAb X Fc region affects the Fc receptor binding activities of the mAb in vitro and thus the same can be reflected in vivo as well, affecting the biological function. Additionally, as discussed earlier, the immunogenic effect of this substitution is unknown and very difficult to predict through any in vitro studies. Thus, control of the mAb X' in the final drug substance and drug product is very important. To enable a downstream/purification process for removal of the mAb X' , a method is required to quantify this modification accurately at different in-process stages. The relative abundance of E262K mAb X' can be quantified from LC-MS analysis of the trypsin digested protein, using the equation below (Eq. 1). The quantification of the area under the curve from the corresponding UV signals from the tryptic peptide map profile is the simplest way; however, both the E262K peptide and parent peptide co-elutes with other peptides in the LC profile and thus quantification based on the UV signal would not be accurate enough. Complete separation of these two peptides from other peptides could not be achieved using multiple enzymes and long and shallow gradient (120 min of 2-96% of 0.09% TFA in 90:10 acetonitrile: water). Additionally, the intensity of low levels of substituted peptide was insufficient to provide good UV signal for quantitation. As a result, UV profiling could not be used for relative quantitation and signals from coupled mass spectrometer were used for this purpose. Extracted ion chromatogram (EIC) peak of the E262K and parent peptides from LC-MS were used to quantitate the area under the curves of E262K peptide and parent peptide for relative quantitation as per Eq. (1).
The PMF-EIC method was developed on LTQ Orbitrap XL mass spectrometer (ThermoFisher Scientific) to detect and quantify the E262K substitution at various in-process stages and in drug substance and in drug product to ensure effective control of E262K variant through the purification process. However, in general the PMF-EIC method has two major challenges: (1) matrix or ion suppression by co-eluting peptides; (2) ionization efficiency of the peptides due to sequence and peptide size 18,38 . Thus, the relative quantitation of E262K modification was based on the following two assumptions-(1) The ionization potential of both E262K/mutant and Native/parent peptides are similar because they are similar in size and largely share a common sequence and (2) the MS response is linear in quantitation range of both E262K peptide and parent peptide present in the sample.
The PMF-EIC method was tested for different validation parameters as per ICH guideline Q2(R1) to establish the suitability of this method for the intended purpose. Although the method was able to produce repeatable data during multiple analysis within a single day, a high relative standard deviation (~ 19%) was observed during inter-day precision study over 6 days with a sample containing ~ 0.1% E262K modified peptide (Table 2A). The specificity of this method to this particular modification was tested using the same antibody from a different source (mAb X2) and with other mAbs (mAb A and mAb B) having the same sequence in the Fc region. These  Figure 5. Imaged capillary electrophoresis isoelectric focusing (iCE) profiles of mAb X lots and mAb X' (a) before and, (b) after CPB digestion. The lysine variant peaks disappeared post CPB digestion (indicated with arrows). (c) HIC profiles of mAb X lots and mAb X' showing the relatively higher hydrophilicity of mAb X' . mAb X sourced from external agencies are labelled as mAb X2. www.nature.com/scientificreports/ mAbs do not show far basic variants in CEX analysis and thus are not expected to contain the E262K modification. Interestingly, small amounts (< 0.07%) of truncated peptide (E262K peptide) was also observed in these antibodies. Similar to the FBV of mAb X, this peptide in mAb X2, mAb A and mAb B, eluted at a different RT than the parent peptide, negating the possibility of in-source fragmentation of the parent peptide during MS analysis. This data indicates that small amount of E262K peptide (VTCVVVDVSHEDPEVK) peptide can also be generated during sample processing, as degradation product of the parent peptide (TPEVTCVVVDVSHED-PEVK). Non-specific cleavage by trypsin during 16 h digestion in Tris Cl buffer (pH 8), extended storage in auto-samplers at 4 °C and freeze − thaw of digested samples could also contribute to the degradation observed. The estimated % substitution was highly variable at these levels indicating that the detection was below the limit of quantitation (LOQ). Based on multiple inter-day analyses of the same batch of mAb X2, maximum of 0.07% substitution was observed and assigned as noise. Similar noise was also observed in other antibodies (mAb A and B) which share similar sequence in Fc region. Further, to establish the linearity and accuracy of the method, synthetic peptides were used. The E262K peptide (VTCVVVDVSHEDPEVK) and the parent peptide (TPEVTCVVVDVSHEDPEVK) were chemically synthesized and alkylated at the cysteines to mimic the E262K and parent peptides obtained during the reduced peptide map analysis. To assess the linearity of the area under the curve (AUC) obtained from E262K peptide over a dynamic range of the concentration, serial dilutions of E262K peptide was analyzed by PMF-EIC method and the signal (AUC) obtained ( Fig. S4a and b, supplementary material) was plotted against the respective peptide concentration. Based on the concentration of samples injected in a typical PMF-EIC experiment (and, considering 100% cleavage by trypsin), the concentration range of the peptide was selected to mimic samples with as low as 0.01% E262K peptide. Although, a linear response was observed from the AUC of the E262K peptide over 86 fmole to 17.2 nmoles concentration range (Fig. S4c, supplementary material), the recovery (actual concentration relative to the concentration calculated from linear plot), for most of the concentration points, was far outside the generally acceptable range of 0.8-1.2 (Table S2, supplementary material). Additionally, to assess the accuracy of the method, E262K peptide and parent peptides were mixed at 1:1 molar ratio (so the expected % E262K is 50) and analyzed through the PMF-EIC method (Table 2B). At low column load (~ 8 pmol) the method was accurate enough to estimate the % E262K (47% compared to the expected 50%), however the variability among the three triplicate analysis was very high (CV = 44.3%). On the other hand, although the method was consistent (CV = 5.7%) with high column load, the estimate of % E262K was not accurate (28% compared to the expected 50%). Overall, these results indicate the limitations of this method to estimate % E262K in accurate and consistent manner and sheds reasonable doubt on the basic assumption of similar ionization potential for the two peptides and the linear response of the peptides in the quantitation range.
Based on these limitations found for PMF-EIC based method, a SRM based method was developed for accurate quantitation of E262K variant in mAb X.

Absolute quantitation of the sequence variant by SRM based mass spectrometry (QQQ-SRM).
In an effort to develop an accurate method for more selective and sensitive detection of peptides, selective reaction monitoring (SRM) approach was utilized. In contrast to PMF-EIC, where the mass of interest is extracted post data acquisition, in SRM parent ions are exclusively selected, fragmented and dominant daughter ions can be selected to produce the final MS signal. Thus the method is very selective as the final MS  S5a and b and Table S3, supplementary material) and the results were largely unsatisfactory. Although the SRM signal was reasonably linear (R 2 = 0.98) across the EK peptide concentration range of 0.16-80 pmol, the recoveries were inconsistent and mostly outside the generally acceptable range of 0.8-1.2. Due to unsatisfactory linearity and sensitivity in LTQ Orbitrap, the SRM method was explored on triple Quad quantitative MS instrument, TSQ Quantum Ultra (with triple Quadrupole analyzers) from Thermo. Triplequadruple (QQQ) tandem mass spectrometer (MS/MS) provides multiple reaction monitoring (MRM) mode wherein multiple parent and daughter ions can be selected. The dominant charge states of both E262K and parent peptide were selected and subjected to fragmentation to release daughter ions. The dominant daughter ions were selected to give final signal for area quantitation against standard calibration plots (in moles) from synthetic E262K and parent peptides. The absolute quantity of E262K peptide (in moles) and the Native peptide (in moles) was then used to determine the % E262K substitution in mAb X as per the Eq. (2).
The gradient method and mobile phases used in PMF-EIC was further optimized for shorter run time (30 min) and increased ionization, and the method was assessed for linearity, accuracy, precision and matrix effects using the synthetic peptides in the desired linearity range. Unlike the PMF-EIC technique, this method was not only linear over a dynamic range of peptide concentrations (Fig. 6), it was also able to measure the concentration accurately (recovery 0.9-1.1) at all the concentration points ( Table 3).
The precision and accuracy was evaluated at four concentration levels of quality control standards: LLOQ (lower limit of quantitation), LQC (Lower quality control), MQC (Medium quality control) and HQC (High quality control). The design of calibration curve is based on expected % E262K content in samples from antibody manufacturing process where very low levels of E262K peptide were observed in comparison to Native peptide. The acceptance criteria were adopted from regulatory guidelines for bioanalytical methods where the observed concentration should be within ± 15% of nominal value at LQC, MQC and HQC and ± 20% for LLOQ [51][52][53] , while four out of six (67%) of QC standards at each concentration level should pass this criterion. The results of this study is summarized in Table 4. The % CV calculated between the six analysis at LQC, MQC and HQC was within 10% for EK and native peptides, while the % CV was less than 15% at LLOQ level for both the peptides. All six analyses at LQC, MQC and HQC level with EK peptide was within the ± 15% of nominal value and four out of six analyses was within ± 20% of nominal value at LLOQ level. On the other hand, in case of the native peptide, all six analyses at LQC and MQC level and five out of the six analysis at the HQC level was within the ± 15% of nominal value and all six analyses at the LLOQ level was within ± 20% of nominal value.
The study samples (trypsin-digested mAb X) would have multiple other tryptic peptides as background matrix to E262K and Native peptide. Moreover, shorter runtime adopted for this method (to increase through-put) resulted in co-elution of multiple peptides, which can significantly suppress the ionization of target peptides or reduce selectivity due to matrix interference. In the expected range of % E262K substitution in mAb X test www.nature.com/scientificreports/ samples, Native peptide is generally highly abundant and the EK peptide is present at very low amounts, the EK peptide was therefore tested for matrix interference through spike recovery. mAb X2, which has identical amino acid sequence as mAb X but does not contain the E262K, was used as E262K free matrix and six replicates at four concentration levels of QC samples (LLOQ, LQC, MQC and HQC) of EK peptide were spiked in trypsin digested mAb X2. The concentration of EK peptide on these spiked samples were estimated by SRM method and spike recovery was calculated (Table 5). At all the concentration levels at least five out of six replicates met the acceptance criteria, while the % CV between the six replicates was within the acceptable range as well. The recovery of Native peptide is discussed in Supplementary material (Table S4 and Supplementary text) However, since the % CV was on the higher side (> 10% in three out of four concentration levels), we adopted the strategy of running n = 2 independent preparations and reporting the average value only when the % CV between the two replicates is ≤ 15%, while the analysis will be repeated if the % CV between the two replicates is > 15%. This approach provided consistent results in routine analysis when the runtime of the sample sequence is not more than 20 h. Additionally, for effective removal of matrix interference and carry over in subsequent runs, the flow rate was increased to 1.5 ml/min in the wash step of the LC run which was diverted to waste.  www.nature.com/scientificreports/ Taken together, the SRM based approach in Triple-Q MS system was validated successfully for E262K SV estimation in mAb X samples with lower limit of quantitation as low as 0.007%. This was calculated considering on-column protein load (~ 65 µg) and lower limit of EK peptide calibration curve ie. 65 f. moles.

Control of E262K substituted product during downstream purification. Once a sensitive method
was established to quantify the E262K SV, the next step was to control the variant in the final drug product. To achieve this, protein A purified mAb X was fractionated through preparative CEX and tested on analytical CEX. As expected, the initial fractions were enriched in acidic variants and the basic variants gradually increased towards the later fractions. B1, B3 were identified as lysine variants, while B4 was characterized to be aggregates. Since B5 was characterized as E262K sequence variant, all the B5 containing fractions were discarded and rest of the fractions along with CEX load were analyzed by SRM mass spectrometry. Table 6 provides the distribution of acidic and basic charge variants and % E262K substitution in all these fractions in a representative batch of mAb X. The CEX load (inclusive of all charge variants and B5) contained 0.456% E262K substitution. Although the fractions reported here did not contain any detectable B5, trace amount of E262K substitution was still estimated in them illustrating the inconspicuous nature of the sequence variant. The E262K variant was more prominent in the later fraction and, contrary to the basic nature of this variant, early acidic fractions (F1, F2) also contained relatively higher amounts of E262K SV. The reason for this distribution could be the charge profile of sequence variant itself. Similar to mAbX, mAbX' is also an antibody which will have its own basic and acidic species. The occurrence of E262K substitution in later basic fractions of mAbX is due to overlap of acidic variants from mAbX' . The early acidic fractions of mAbX are enriched in fragments, the E262K detected in these fractions could be fragments of mABX' eluting there. Interestingly, excluding the early acidic fractions, a correlation between % B3 and % E262K was apparent in this analysis. The same correlation was also explored in another independent batch of mAb X and a linear relationship between % B3 and % E262K was established (Fig. 7).
Generally, sequence variant at < 0.1% level at a single site is considered to be acceptable to make sure that the sequence variants in total remain below a threshold (1-2%) 4,7 . However, a very conservative approach was taken here and only the fractions containing ≤ 0.050% E262K substitution was considered for pooling, which corresponds to ≤ 10% B3, as per the linear correlation established between B3 and % E262K. Having B3 below 10% also helped in controlling basic charge variants in the final drug product. Thus, in addition to the established pooling www.nature.com/scientificreports/ criteria to control the product quality attributes such as fragment, aggregate, deamidation etc., this criteria (B3 ≤ 10%) was also applied to pool the CEX fractions for further processing. Thus fraction F3 to F8 were pooled for the batch illustrated in Table 6 and the final drug product obtained had % E262K substitution as low as 0.014. This approach was used to control the E262K variant in ten consecutive batches of mAb X and the SV was controlled to under 0.04% in all these batches. Further, pre-clinical toxicology study conducted in monkeys with multiple doses of mAb X having ~ 0.080% E262K did not reveal any product specific safety findings. Taken together, the highly sensitive SRM method enabled the control of E262K variant to a level where it does not impart any efficacy and safety concern.

Discussion
In this communication, we have reported the identification and characterization of a sequence variant in monoclonal antibody based therapeutic and developed two different LC-MS/MS based approaches to estimate the SV. The more sensitive technique between these two, the SRM based approach in a QQQ mass spectrometer, was validated and further utilized to control the SV in the final drug product during downstream purification process.
While next generation sequencing (NGS) and software based SV searches in high resolution LC-MS/MS data generated from the enzymatic peptide map analysis of drug product or upstream products are used widely to identify sequence variants in therapeutic proteins, both of these techniques have challenges, especially when the SV is present in trace amounts. NGS can be time and cost consuming and may result in false positives, while trace amounts of SV may evade the software based search due to lack of sufficient MS or MS/MS data. However, an approach combining NGS and LC-MS/MS, where all the hits resulted from NGS analysis can be further verified by targeted processing of LC-MS/MS data, can be the most reliable approach for SV identification in a product or clone. In absence of NGS capabilities, LC-MS/MS based characterization of enriched charge variants was utilized here to identify any sequence variants in mAb X. The E to K substitution at 262 position in heavy chain Fc region was identified by tryptic peptide map LC-MS/MS analysis of the far basic variant (FBV) and this finding was further validated by Glu-C digested peptide map LC-MS analysis. The E262K containing mAb X (mAb X') was purified (~ 98%) from CEX and characterized by an array of physicochemical and Fc related functional assays. Interestingly, although E is more hydrophilic than K, the mAb X' was appeared to be more hydrophilic than mAb X in HIC analysis, indicating the possibility of a structural difference between these two variants. This observation was further substantiated by the differences observed in FcγRIIa binding capabilities of these two products. Since E262 is not known to be directly involved in FcγRIIa interaction, it is more likely that a structural alteration due to E262K modification in mAb X' is affecting the FcγRIIa binding potency. Additionally, the apparent structural alterations may also lead to the far basic nature of mAb X' . As evident from the iCE analysis of mAb X and mAb X' , the pI of mAb X' is similar to the one lysine variant of mAb X (Fig. 6a). However, in CEX analysis the mAb X' elutes much later than the one lysine variant (B1) of mAb X (Fig. 5a). The separation in CEX depends on the accessible charge of the protein and the accessible charge may depend on the structure of the protein. Certain structural changes may expose relatively charged residues to the column resulting in a change in the column-protein interaction and thus these variants may elute differently. Hence, the relatively strong basic nature of mAb X' may signify certain structural modification in the SV containing protein. However, this structural alteration was not detected in Far and Near UV CD, FT-IR and DSC indicating that the global structure may not be impacted. At this moment the exact location of this suspected structural modification is not clear and high resolution methodologies such as hydrogen-deuterium exchange mass spectrometry (HDX-MS) can be used further to pin-point the exact region of the apparent structural alteration.
Since the modified (SV containing) mAb X elutes as far basic variant in analytical CEX, the same separation technique can be used during the downstream purification to control the SV in drug substance and drug product. To enable this approach highly sensitive Mass spectrometry based methods were developed to estimate the trace amount of SV. Although the peptide map LC-MS-extracted ion chromatography (PMF-LC-MS-EIC) based method is a relatively simple and widely used for MS based PTM/variant analysis, this method was not able to estimate the E262K variant with acceptable accuracy and consistency. This method presumes that the peptides involved in % variant calculation (native and EK peptide, in this case) ionize similarly under the given mass spectrometry conditions. However, the method validation results indicate that this assumption may not be true and the three amino acid difference between these two peptides may bring in some differences in mass spec ionization potential, leading to inconsistent and inaccurate data. The alternate approach, SRM based method in a Triple-Q MS, was found to be much more sensitive and accurate. This method depends on the absolute quantification (in pmoles) of the native and EK peptides based on the parent and daughter ions specific to these www.nature.com/scientificreports/ two peptides. Additionally, the SRM based method was designed to be a shorter one and thus providing a better turn-around-time (TAT) for in-process sample analysis during the downstream purification. This method was successfully validated and used as in-process control to limit the E262K content in the purified mAb X. All the CEX fractions generated during the CEX purification step was analyzed by SRM based method and only the fractions containing insignificant amounts (≤ 0.05%) of SV was pooled to proceed further. Generally, the CEX fractions are pooled based on certain product quality attributes such as aggregate, fragments, charge variants etc. and results in some loss of the product. The additional pooling criteria (% E262K ≤ 0.07 and % B3 ≤ 10, based on the correlation between % E262K and % B3) imposed here ensured insignificant amounts (< 0.04%) of E262K SV in the drug product and it was utilized to generate mAb X consistently in the lab and at the pilot and manufacturing scales. Animal toxicity studies was conducted in Cynomolgus monkeys with a drug product with ~ 0.08% E262K SV and no toxic reactions were reported. Further, the same approach was endorsed by regulatory agencies for manufacturing drug products for clinical use. Interestingly, although the SRM based method was able to detect and quantify the SV in all the CEX fractions, the SV was below detection level of the PMF-EIC method in many fractions. The PMF-EIC method was not able to detect the SV in the drug product as well. This observation further emphasizes the importance of developing a very sensitive technique to estimate trace amounts of sequence variants. Overall, sequence variants are considered to be undesired for the bio-therapeutics and appropriate measures should be taken to control SVs at the very early stage of the product development. While a combination of NGS and HRMS can be a tool for early detection of SVs at the clone level, the time and cost associated with a reliable NGS assessment may make this approach non-accessible for all the developmental programs, especially at the early stage. In those scenario, thorough characterization of enriched product variants through multiple analytical techniques can provide reliable information on the nature of different variants present in the product, including the sequence variants. Further, as described here, the inherent chemical and structural nature of the SV can be utilized to purify out the variant containing product and availability of a very sensitive analytical technique to reliably estimate trace amounts of SV is pivotal to this approach. To our knowledge, such an extensive characterization of sequence variant in antibody biopharmaceutical and its control in the final drug product using mass spectrometry has not been demonstrated earlier. At times the clone producing the highest titer and a product with desirable quality attributes may contain trace amounts of SV and rejecting the clone right away may impart serious business implications. Thus, the approach presented here can be utilized to understand the properties of the SV extensively and based on the assessment, sensitive techniques and strategies can be designed to control the SV in the purified drug product.

Methods
Samples and materials. The IgG1 mAbs X, X' , A and B were expressed in standard CHO cells and purified using standard antibody purification procedures at Biocon. No animals were used for experimentation. mAb X2 was sourced from external agency. The list of reagents and other materials used is described in Supplementary material. Reagents and materials used in analytical techniques were procured from various vendors as described below. Dithiothreitol (DTT), Tris base [tris(hydroxymethyl)aminomethane)], trifluoroacetic acid (TFA), acetic acid (glacial), calcium chloride dihydrate, and hydrochloric acid used in sample processing were purchased from Sigma-Aldrich and Guanidine hydrochloride and iodoacetamide (IAM) were obtained from Sigma. Trypsin (sequencing-grade) was purchased from Promega and LysC (sequencing grade modified) was obtained from Roche. Acetonitrile from J.T. Baker was used in mobile phases. Deionized water (18 MΩ cm at 25 °C) for mobile phases was prepared using a Millipore's Milli-Q purification system. Customized peptides: VTCVVVDVSHED-PEVK (EK peptide) and TPEVTCVVVDVSHEDPEVK (Native peptide) were custom synthesized from Gen-Script (Piscataway, NJ). C-13 and N-15 labelled Valine containing EK peptide: V*TCVVVDVSHEDPEVK and Native peptide TPEV*TCVVVDVSHEDPEVK were used as internal standards and custom synthesized from Polypeptide (France). * indicates C-13 and N-15 labelling of Valine. Primers atgatctcccggacccctgaggtcacatgcgtggtggtggacgtg and atgatctcccggacccctaaggtcacatgcgtggtggtggacgtg were obtained from Life Technologies.

Intact mass analysis.
Intact antibody samples were diluted to a concentration of 1 mg/mL with 0.1% TFA in 50: 50 acetonitrile: water and analyzed using reverse-phase LC-MS on Waters ACQUITY UPLC with a photo diode array (PDA) detector coupled to Waters Synapt high definition mass spectrometry (HDMS) system equipped with an ESI source. The samples were injected on an ACE5 C4 column (100 × 2.1 mm) for chromatographic separations. Mobile phase A was 0.1% Formic acid in Milli-Q water and mobile phase B was acetonitrile. Elution was achieved using a 10 min gradient of 10-90% of acetonitrile. Flow rate and column oven temperature were set at 200 μL/min and 40 °C, respectively, throughout the run. Mass spectrometric analysis was carried out in positive ion mode. Scan range of 2000-4000 m/z was used along with 3.00 kV capillary voltage and 40 V as cone voltage. Desolvation gas temperature was set to 300 °C and source temperature was 120 °C. Trap and transfer collision energy values were 5 V each. Instrument was calibrated in the m/z range of 150-4000 using Sodium Iodide. Deconvolution of the ESI mass spectra was done using Max Ent 1 algorithm in Mass Lynx v4.1 software. The mass range used for deconvolution was 145,000-155,000, minimum intensity ratio left and right being 20%. Damage model was "Uniform Gaussian" and width at half height was 2.4. Number of iterations was set to 15. Reduced mass analysis. Intact antibody samples were denatured with Guanidium hydrochloride (final concentration of 3 M), reduced with DTT (final concentration of 10 mM) at 37 °C for 1 h and diluted to a final concentration of 1 mg/mL with 0.1% TFA in 50% acetonitrile. The samples were injected on an ACE 5 C4-300 (100 × 2.1 mm; 5 μm particle size; 300 Å pore size) column for chromatographic separations. Mobile phase A was 0.1% Formic acid in Milli-Q water. Elution was achieved using a 27 min gradient of 10-50% acetonitrile Peptide mass fingerprinting-EIC method. Intact antibody samples were denatured using guanidium chloride (final concentration of 3 M), reduced using DTT (final concentration of 10 mM) at 37 °C for 1 h and then alkylated using IAM (final concentration of 20 mM) at 37 °C for 1 h. After alkylation, the samples were desalted using a size exclusion GE HiTrap Desalting (5 mL) column at a flow rate of 0.3 mL/min using 0.05% TFA in 40:60 acetonitrile: water as the mobile phase. The protein eluting from the column was collected in a microcentrifuge tube and concentrated in a Savant SPD121P SpeedVac concentrator (Thermo Scientific). The optical density (OD) of the samples was determined by recording the absorbance at 280 nm and correcting for any light scattering at 340 nm using a spectrophotometer and the final concentration of the protein (mg/mL) was calculated from the OD reading using extinction co-efficient of 1.64 (theoretical extinction coefficient based on the confirmed amino acid sequence). The desalted sample equivalent to 250 µg of collected protein was concentrated further for digestion with trypsin up to a final volume of 70 μL. www.nature.com/scientificreports/ Preparation of calibration curve using peptide standards. Synthetic peptides for EK and Native/ parent sequences were used as standards for the quantification of % EK in unknown samples. The working stock of the native/parent and EK peptides was prepared separately by denaturing (using guanidium hydrochloride), reducing (using DTT) and alkylating (using IAM) 1 mg/mL master stock solution and further diluting to 0.12 mg/mL (Native) and 0.04 mg/mL (EK) using the diluent (2% acetic acid in 20:80 acetonitrile: water). Master stock solution was prepared by dissolving the lyophilized powder of respective synthetic peptides (Native and EK) in 50 mM Tris HCl buffer with 1 mM Calcium Chloride (pH 8.0) to get 1 mg/mL solution .  Tables S5 and S6 show the scheme of preparation of standards for calibration curve and quality control standards of EK and Native/parent peptides from respective working stock solutions.
For recovery experiments 100 µL of appropriate standard was added to 900 µL of mAb A' trypsin digested sample and 50 µL was injected on HPLC. For Native peptide, to reduce contribution of inherent Native peptide in mAb A' matrix, 500 µL of appropriate standard was added to 500 µL of mAb A' trypsin digested sample and 50 µL was injected on HPLC.
Internal standards spiking: 5000 ppb levels of EK and Native internal standards were spiked into each calibration standard and samples.
Non reduced peptide mapping using Lys C. Disulphide mapping analysis was performed on Waters ACQUITY UPLC coupled to Waters Synapt HDMS system. 100 µg of intact antibody was denatured using 6 M guanidine hydrochloride at 37 ºC for 30 min. 1 ml of the cooled Ethanol is added and stored in − 20 °C for 1 h for precipitation of the protein. The sample is centrifuged at 8000 rpm for 15 min and collected precipitate was treated with 50 µl of 2 M Urea, 2 mM CaCl 2 , 0.2 M Tris HCl (pH 6.5) and 2.5 µg of Lys C enzyme (Roche sequencing grade modified; reconstituted with MilliQ water) in the ratio of 1: 20 (Lys C: antibody, w/w). The reaction mixture was incubated at 37 °C for 48 h. The digested sample was further analyzed LC MS. Standard operating conditions were used for LC MS as described below:  Differential scanning calorimetry. The intact antibody was diluted to 2 mg/ml using placebo and was loaded on to the sample holder whereas reference holder is loaded with the respective placebo. The spectrum was acquired for temperature scan range of 20-100° C at 30 °C per hour scan rate.
Intrinsic fluorescence spectroscopy. 0.2 mg/ml of antibody for intrinsic fluorescence with excitation at 278 nm and emission spectrum recorded from 300 to 400 nm at scan rate of 600 nm/min. Average of 10 scans was stored as final spectrum. Both excitation and emission slit width was kept at 5 mm.
Non reduced capillary electrophoresis using sodium dodecyl sulphate (NR CESDS). CE analysis was performed on Sciex PA 800 Plus instrument using 30 cm capillary with separation voltage of 18 kV applied for 30 min. The antibody was desalted using 10 kDa MCWO nanosep at 8000 rpm and diluted to 1 mg/ ml using SDS sample buffer. 2 µl of 10 kDa internal standard, 5 µl of 250 mM iodoacetamide was added to the reaction volume of 100 µl. and incubated at 70 °C for 3 min. The reaction mixture was spun at 8000 rpm for 8 min to remove air bubbles and transferred to CE universal vials. The samples were electrokinetically injected at 10 kV for 25 secs. 32 karat software was used for processing electropherogram.
Imaged capillary iso-electric focusing. iCE analysis was performed on ProteinSimple iCE 280 using focusing period of 2 min at 1500 V followed by 5 min at 3000 V. The 10 mg/ml of antibody was desalted using 10 kDa MCWO nanosep using MilliQ at 13,000 rpm. To 5 µl of desalted antibody 185 µl of 0.35% methyl cellulose gel (Protein Simple) containing 8 M urea, 7 µl of pharmalyte 3-10, 3 µl of pharmalyte 8-10.5 (GE heathcare), 0.2 µl of pI marker 9.77 and 7.40 (Protein Simple) was added. The mixture was vortexed and spun at 8000 rpm for 8 min to remove air bubbles and transferred to CE universal vials.
Fc binding using SPR based capture format. Affinity to recombinant human FcγRIIa, FcγRIIb and FcγRIIIa were determined using surface plasmon resonance (SPR) with a Biacore T200/T100 (GE Healthcare). A penta-His antibody (Qaigen) was covalently immobilized on a CM5 chip using standard amine coupling