Best practices and benchmarks for intact protein analysis for top-down mass spectrometry

One gene can give rise to many functionally distinct proteoforms, each of which has a characteristic molecular mass. Top-down mass spectrometry enables the analysis of intact proteins and proteoforms. Here members of the Consortium for Top-Down Proteomics provide a decision tree that guides researchers to robust protocols for mass analysis of intact proteins (antibodies, membrane proteins and others) from mixtures of varying complexity. We also present cross-platform analytical benchmarks using a protein standard sample, to allow users to gauge their proficiency.

experimental artifacts 25 . Current top-down sample cleanup methods (for example, protein precipitation 26 and molecular weight cutoff (MWCO) ultrafiltration) are not applicable to all sample types or downstream MS analyses. The demand for robust, generally applicable methods for intact protein MS is the most common request made to members of the Consortium for Top-Down Proteomics 27,28 (http://topdownproteomics.org/).
Our goal here is to address this unmet need, by providing a guide to enable users with all levels of expertise to acquire highquality intact protein mass spectra by ESI-MS. First we discuss signal suppression associated with common buffer components and biotherapeutic excipients. This provides the rationale for most failed intact MS measurements and, in addition, a path to designing MS-compatible buffers. Then, we present a decision tree based on sample composition and experimental goals, which guides users to a best-practices protocol and corresponding benchmark data.

Origins of signal suppression and signal spreading
Biological, biochemical and biotherapeutic sample preparations usually contain numerous interfering substances (for example, salts, detergents, chaotropes and buffers) that lead to signal suppression during ESI-MS analysis. To provide a theoretical context, we describe the two major drivers of the quality of intact protein (positive ion) ESI-MS and how these are affected by interfering substances. The first driver of quality is the formation of desolvated protein ions, which can be understood in terms of a few critical steps during the ESI process 16,29,30 . Interfering substances generally affect the ESI process after the formation of nanodroplets at the Rayleigh charge limit. Two salient, often opposing, processes that occur within these nanodroplets are the partitioning of net charge toward the droplet surface and the minimization of solvation energy. Polar species such as salts and native proteins partition toward the droplet interior to optimize solvation energy; their ionization, therefore, requires evaporation of solvent molecules 16 . Hydrophobic species such as detergent monomers and unfolded proteins migrate to the droplet surface to optimize solvation energy and, in a faster process that requires less energy, evaporate or are ejected. Many of the techniques presented here for reducing signal suppression can be rationalized within the framework above. For example, organic solvents that decrease surface tension should promote the ionization of both polar and nonpolar analytes; detergents partition to the surface, where they can outcompete analytes for a limited number of protons; organic solvents and acids that unfold proteins should promote ejection-based ionization; native MS (nMS) requires greater desolvation energy and is more sensitive to polar contaminants.
The second driver of the quality of intact protein MS is signal spreading (that is, the distribution of the signal from a single proteoform across multiple channels), which increases with protein mass. Each channel has its own respective noise; consequently, the cumulative noise increases proportionally to the number of channels. The ESI process promotes signal spreading, via adduct formation, by increasing the concentrations of interfering substances and proteins. Heavy isotopes and charge states further distribute signal intensity across multiple channels; the former can be mitigated by isotope depletion 31 . Here we describe experimental techniques that minimize signal spreading (increase signal-to-noise ratio, or S/N), including using nMS to reduce the number of charge states, and the use of volatile salts (for example, ammonium acetate) or purification to minimize the effects of alkaline salts.

Signal suppression by common buffer components
Using the intact protein standard mixture (ubiquitin, myoglobin, trypsinogen and carbonic anhydrase) established by the National Resource for Translational and Developmental Proteomics (NRTDP) (http://nrtdp.northwestern.edu/protocols/), we evaluated common buffer components (Fig. 1) to quantify the concentration required for 50% signal suppression during direct infusion ESI. By analogy to half-maximum inhibitory concentration (IC 50 ) nomenclature, we termed this metric the half-maximum suppression concentration (SC 50 ) (Fig. 1, Supplementary Fig. 1). At their typical concentrations, all common buffer additives suppressed ESI signal considerably. Consistent with the mechanisms of ESI ionization described above, detergents produced the most signal suppression,  The SC 50 values given in Fig. 1 allow users to design MS-compatible buffers. In addition, the SC 50 and buffer composition serve as the entry point into the decision tree outlined below, leading users to the appropriate protocol. Although the trends in SC 50 values reported here should generally be consistent across MS platforms, parameter-dependent variations in the reported values are likely (in particular, flow rate, voltages, temperatures, and pressures that affect ionization and desolvation). Here, for example, we calculated SC 50 obtained by direct infusion using a standard microflow ESI source (about a microliter per minute), but nano-ESI (less than a microliter per minute) is less affected by salts because of the order-of-magnitude decrease in initial droplet size 32,33 .

Intact protein MS (IPMS) decision tree
The IPMS decision tree (Fig. 2) directs practitioners to a protocol or a combination of protocols based on buffer composition, the number of proteins in the sample, and whether native or denaturing conditions are to be used. Consider, for example, a purified protein in phosphate-buffered saline (PBS). Based on the 1.5 mM SC 50 exhibited by NaCl (Fig. 1c) and the 137 mM NaCl present in PBS, a protein sample in PBS requires a 91-fold dilution to achieve 50% of the potential MS signal. Therefore, if the protein concentration is greater than 90 µM and salt adducts will not impede data analysis, the sample can be diluted following Protocol 1. Otherwise, sample cleanup by ultrafiltration using spin cartridges with a MWCOmembrane is recommended following Protocol 2.
Interest in certain PTMs (for example, metallation) or protein complex quaternary structure would dictate the use of native MS methods following Protocol 4b; otherwise the denaturing MS Protocol 4a is recommended. Depending on the complexity of the sample, additional separation techniques such as GELFrEE may be required ( Supplementary Fig. 2). The objective of this decision tree is to provide a proven workflow for any sample, not to rule out alternative methods. For example, depending on sample stability, user expertise and available resources, precipitation (Protocol 3), size exclusion 'spin cartridges' , or LC (Protocol 5) could be suitable alternatives to MWCO ultrafiltration. All protocols and benchmarks referenced by the decision tree and alternative methods are summarized below and further detailed in Supplementary Notes 1−5 and Supplementary Protocols 1−5.

Protein standards and benchmarks
To promote standardization and allow users to benchmark their own data using readily available proteins, we provided representative results for each protocol using the following commercially available standards: (i) the NRTDP intact protein standard mixture (see Supplementary Note 1 for preparation instructions), (ii) NIST monoclonal antibody reference material 8671 (NISTmAb), containing humanized IgG1ĸ in 12.5 mM L-histidine, 12.5 mM L-histidine HCl (pH 6.0), and (iii) Sigma bacteriorhodopsin from Halobacterium salinarum (B0184). Benchmarks for mass accuracy depend upon the instrumentation platform and have been reviewed 3,34-39 . Rules of thumb include requiring 10 p.p.m. accuracy for modern Fourier transform MS and 20 p.p.m. accuracy for modern quadrupole time-of-flight (QTOF) MS. We suggest the use of ProForma notation 40 for standardized proteoform nomenclature, and note that the PeptideMass tool (https://web.expasy.org/pep-tide_mass/) can be used to calculate the mass of a given sequence or of proteoforms contained in the UniProt database.

Protocol 1: sample preparation by dilution of interfering substances
Consistent with the mechanisms of ESI and signal spreading detailed above, common buffer components render proteins undetectable by MS (Fig. 3). Minimally complex, concentrated protein solutions can often be analyzed by direct infusion, following dilution to ~1 µM final protein concentration in the appropriate sample buffer. Users should consider using this protocol if dilution can decrease the concentration of a given interfering substance below its SC 50 value (Fig. 1, Supplementary Protocol 1). Assuming a practical upper limit of ~10 mM protein concentration, this protocol is potentially applicable to any of the components listed in Fig. 1. As detailed above, however, nMS utilizes an ESI process that is more sensitive to many interferents, including salts. Consequently, dilution is less likely to adequately improve nMS. Protocol 4 describes methods to dilute native proteins into whichever solution will be used to introduce samples to the MS. However, mass spectra obtained by this method have the lowest S/N of any of the protocols described here and may contain adducts.

Protocol 2: sample preparation using MWCO ultrafiltration
We recommend remediating nonvolatile salt adducts by buffer exchange into a solution of volatile salts. The MWCO of the ultrafiltration device should not exceed half the molecular mass of any given protein in a sample to prevent possible sample loss. No particular pH is optimal for all proteins, but pH extremes should be avoided, as should pH that is equivalent to a protein's pI, where protein solubility is at a local minimum 41 . We recommend using ammonium acetate throughout these protocols owing to its volatility and ability to act as a stabilizing background electrolyte during ESI 42 . Ammonium acetate provides maximal buffering around pH 4.75 (acetate) and 9.25 (ammonium), and results in a neutral pH  Protocol 2b, native membrane proteins. Membrane proteins are estimated to account for 23% of the total human proteome and represent ~60% of targets for currently approved drugs 43,44 . The mass analysis of native, intact membrane proteins can further provide key information regarding stoichiometry, ligand binding and lipid association. A typical analysis of a membrane protein complex requires either size-exclusion chromatography (SEC) or MWCO ultrafiltration to remove alkali salt adducts while maintaining the detergent used to solubilize the protein (Supplementary Protocol 2b) 45 . This differs fundamentally from the MWCO ultrafiltration used during filter-aided sample preparation (FASP) to improve the bottom-up proteomics analysis of membrane proteins, which removes detergents 46,47 . For users interested in native membrane proteins, we recommend the protocols of Robinson and coworkers 45    detailed sample preparation considerations. We demonstrate an example application of Robinson and coworkers' protocols for the native tetramer of Aquaporin Z (AqpZ) from Escherichia coli ( Supplementary Fig. 5).

Protocol 3: sample preparation using protein precipitation
Common precipitation protocols use organic solvents to agglomerate proteins while leaving small molecules, including salts and detergents, solubilized. Whereas MWCO ultrafiltration using Protocol 2a does not rescue protein signal from a preparation containing harsh surfactants (for example, SDS and Triton), precipitation of proteins following Protocol 3 does (Fig. 3, Supplementary Protocol 3). A volume ratio of 1:1:4:3 of aqueous protein sample:chloroform:methan ol:water is recommended to precipitate proteins 26 . The supernatant is removed by aspiration, and the precipitated pellet can be further washed with one more addition and removal of methanol. Pellets are resolubilized for 15 minutes at −20 °C using a small volume of 80% (v/v) formic acid (~25% of the starting volume) and are then diluted to the starting sample volume with HPLC-grade water or a solution of volatile salts (for example, ammonium acetate) 48 . As an alternative method, acetone precipitation has the distinct advantage of leaving many proteins folded. This method, however, has been shown to modify proteins with +98 Da adducts 49 , requires longer incubation at −20 °C (at least 1 h), requires that all steps be performed at or below 0 °C to maximize resolubilization, and can be compromised by detergents.

Protocol 4a: denaturing direct-infusion MS
Denaturing direct-infusion ESI mass spectra can usually be obtained by introducing samples to the MS in a mixture of 49.95% HPLCgrade acetonitrile, 49.95% HPLC-grade water, and 0.1% formic acid (v/v). A 60:35:5 ratio of HPLC-grade methanol:water:acetic acid may be used as an alternative and, in some cases, can improve S/N 9,50 . As described above, the use of these organic solvents and acids results in efficient ionization from a droplet's surface, often allowing MS analysis to be performed using instrumentation parameters typically used for peptides. A more detailed description of instrument parameters for the Bruker SolariX FT-ICR MS used during denaturing direct infusion studies is found in Supplementary Protocol 4a.

Protocol 4b: native direct-infusion MS
Although native MS protocols may not necessarily produce folded ions that match exactly to their in-solution structures, they can be used to achieve accurate mass measurements of native structures and complexes 51 . Consequently, native direct-infusion MS can provide unique structural information, including the characterization of labile PTMs, metal-binding sites, noncovalent interactions with small molecules, and protein tertiary and quaternary structure. Detergent-free samples can be infused directly in aqueous 2.5 mM ammonium acetate 52 , the same solution used in the final stage of Protocol 2a (concentrations of ammonium acetate up to 500 mM can even be used). Figure 4 compares mass spectra of carbonic anhydrase in denatured and native states, with the intensity of the base peak in the native sample being about twofold higher than that of the denatured sample. This comparison was repeated in four additional labs on six different instruments to illustrate the possible range of relative intensities ( Supplementary Fig. 6, Supplementary Protocol 4b).
Membrane protein complexes with MS-compatible detergents can be infused directly from the final solution described in Protocol 2b 45 . To observe native membrane proteins, detergent ions must be removed from the protein-micelle complex by increased collisional activation. This may be achieved through an increase in collision voltage applied to the source or the collision cell (typically 50−200 V), but it could require additional critical parameters that are described in detail by Robinson and coworkers, and in part in Supplementary Protocol 4b 45,53,54 .

Protocol 5: intact protein analysis using LC-MS
Ionization suppression by excipients and by other proteins generally makes the analysis of multiple proteins and proteoforms by direct infusion intractable. For example, many 'high-purity' proteins (as judged by SDS-PAGE) contain numerous proteoforms that cannot be reliably detected and quantitatively assessed without up-front separation 55,56 . Liquid phase separation approaches, including LC (for example, reversed-phase (RP), size-exclusion, ion exchange, chromatofocusing) and capillary electrophoresis techniques (for example, capillary zone electrophoresis, capillary isoelectric focusing) can remove excipients and provide the resolving power for deep characterization of proteins. As directed in the decision tree (Fig. 2), separation of particularly complex samples (>100 proteins) requires an additional dimension of separation before LC-MS. Supplementary  Fig. 2 shows the use of GELFrEE separation prior to LC, which fractionates samples on the basis of protein molecular weights and has resulted in the largest number of characterized proteoforms to date 57 .
Protocol 5a: LC-MS of soluble proteins. RP-LC is recommended for all samples containing more than five unique proteins but is also a viable option for samples with fewer proteins, provided they do not contain high salt concentrations (>1 M) or harsh detergents. The recommended reversed-phase LC protocol is described in Supplementary Protocol 5a and at http://nrtdp.northwestern.edu/ protocols/. Figure 5 demonstrates that sufficient intact MS signal was attained, and four unique chromatographic peaks were observed, using Protocol 5a with a PLRP-S stationary phase (1,000-Å pore size, 5-µm particle size) on a Dionex UPLC coupled to a Thermo Orbitrap Elite. We provide benchmarks for this standard operating procedure (SOP), as well as for additional data acquired using Monolithic and C4 stationary phases, for six widely used platforms (Waters Xevo G2-S QTOF, Supplementary Fig. 7; Bruker Impact II QTOF and Bruker SolariX FT-ICR, Supplementary Fig. 8; Thermo Orbitrap Elite, Thermo Orbitrap Fusion Lumos, and Thermo Orbitrap QE-HF, Supplementary Figs. 9 and 10). To allow users to compare their performance with that of experienced operators using instruments that are operating within specifications, we report S/N for the platforms used here (Fig. 5). However, instrument vendors use proprietary, non-standardized techniques to preprocess data, display data and determine S/N, and, as a result, our data cannot be used for a cross-platform comparison. As an example of a viable alternative method that is notably better suited for proteoforms with similar mass and RP-LC retention (for example, deamidation), we provide a separation of the same protein mix using capillary zone electrophoresis ( Supplementary Fig. 11).
Protocol 5b: intact membrane protein LC-MS. Denaturing LC-MS of intact membrane proteins is not straightforward because of their inherent hydrophobicity 58,59 . Whitelegge et al. provided the earliest example of denaturing LC-MS of membrane proteins using high concentrations of mobile phase additives and demonstrated that ESI of membrane proteins could achieve the 0.01% mass accuracy benchmark established for ESI of soluble proteins 58 . For thorough reviews of the current state of membrane protein analysis via LC-MS 60,61 and the corresponding protocols, we direct readers to refs. [60][61][62] . Current denaturing LC-MS methods for membrane proteins use either size-exclusion 63,64 or reversed-phase separation. Owing to the ease of implementation across a variety of MS platforms, we suggest analysis via reversed-phase LC-MS using a polystyrene-divinyl benzene co-polymer stationary phase (PLRP-S, 300 Å, Agilent). We do not recommend the use of long chain bonded stationary phases such as C8 and C18, as membrane proteins are likely to be retained on the column. As an example, we solubilized enriched bacteriorhodopsin from H. salinarum (Sigma B0184) in 88% formic acid to separate the protein from lipid contaminants. To avoid the risk of formic acid adduction (+28 Da), samples are immediately injected onto the column and solvent exchanged to much lower acid concentrations (0.1%). In the case of membrane protein preparations containing high enough concentrations of lipid contaminants to confound analysis or damage the column, we recommend precipitation following Protocol 3 before analysis. Proteins are eluted using an increasing gradient of 49.95% acetonitrile, 49.95% isopropanol, 0.1% formic acid. Figure 6 shows the analysis of denatured bacteriorhodopsin of H. salinarum following this protocol. Although elution efficiency for some integral proteins may fall well below 100%, PLRP-S columns can be regenerated with 90% formic acid injections. This protocol was performed in five labs on five different instrument platforms ( Supplementary Fig. 12, Supplementary Protocol 5b). An example of an alternative LC-MS method using a more common stationary phase (ZORBAX RRHD 300SB-C3) is provided for aquaporin Z in Supplementary Fig. 5b.

Special methodological considerations for intact antibody mass spectrometry
With the increasing development of biotherapeutics and biosimilars in the pharmaceutical industry, and an increasingly stringent route  to regulatory approvals, there is a growing need for intact antibody MS. Every protocol presented here can be applied to the analysis of intact antibodies (Fig. 3, Supplementary Fig. 13, Supplementary Note 4). However, as antibodies are relatively large and signal spreading increases in proportion to protein size, we recommend against the use of Protocol 1 (dilution) for any regulatory filing.

Discussion
The IPMS decision tree (Fig. 2) guides practitioners of all levels toward broadly applicable methods to obtain high-quality intact mass spectra from any protein sample. The protocols described here have been scrutinized and optimized in over ten expert intact protein MS labs, and successfully applied in laboratories without experience in intact protein MS. We hope that these protocols will enable any research group to adopt intact protein mass analysis.
The accurate mass measurement of an intact protein is the sine qua non of top-down mass spectrometry, which can characterize how proteoforms interact and identify PTMs that are lost in other analyses. High-throughput top-down analysis of whole proteomes has proven successful in the unambiguous identification of hundreds of proteins and proteoforms from a single biological sample 65 and revealed prevalent yet previously uncharacterized biologically relevant modifications 66 . Quantitative top-down proteomics has been used to identify disease-relevant differences in protein levels, an encouraging step forward in the field of proteomics-based personalized medicine 67 . Additionally, by using native mass spectrometry following the top-down workflow, one can observe previously unknown protein-protein interactions, protein-ligand binding, protein-cofactor association and protein-complex stoichiometry, and assess their relationships to important biological pathways 68 . We believe that by starting with intact mass analysis, using these intact protein MS protocols coupled to top-down MS analysis, and by identifying proteoforms rather than proteins, scientists can gain new insights into the human proteome. We also hope that these protocols serve as a starting point for users to push, even further, the current limits of high-molecular-weight mass spectrometry.
All general protocols are available as Supplementary Protocols.