Which proteins are disease biomarkers? What is the impact of diseases on the proteome? How does genetic variation affect protein abundances? Answering these and many other questions requires methods to quantify proteins in large numbers of samples. To address this demand, Messner et al.1 report in this issue a mass spectrometry (MS) method that quantifies thousands of proteins in mere minutes. Such increases in the throughput of proteomics can empower applications that are impractical to carry out with slower methods (Fig. 1).

Fig. 1: Higher throughput of proteomic analysis increases statistical power.
figure 1

As the number of samples analyzed per unit time increases, the cost per sample decreases. The increased throughput and reduced cost allow the analysis of more samples, which in turn increases the statistical power of many applications, such as discovering biomarkers and genetic associations. The higher throughput is particularly important for smaller effect sizes that may demand many thousands of samples to be adequately powered.

Two decades ago, a landmark method, multidimensional protein identification technology (MudPIT), enabled quantification of about 1,500 proteins2. MudPIT set this record by adding a dimension to peptide separation: it used a two-dimensional liquid chromatography approach to separate peptides before tandem MS analysis2. This added dimension was essential to the success of MudPIT, but it also extended the time needed to analyze each sample. In the ensuing two decades, advances in peptide separation methods, MS instruments and methods have reduced the analysis time while simultaneously increasing the number of quantified proteins3.

Despite this progress, the analysis time per sample continues to limit the number of analyzed samples and thus the scope and statistical power of investigations4. To relax this limitation, Messner et al. also added a dimension to protein MS analysis: they used a scanning-quadrupole-based data-independent acquisition (DIA) method previously introduced by Moseley et al.5. The scanning quadrupole method allows continuous movement of the precursor isolation window. Messner et al. exploited this movement to assign precursor masses to the MS2 fragment traces, effectively adding another dimension to peptide sequence determination. This dimension allows improved discrimination between correct and incorrect peptide identifications based on the degree of correspondence between the precursors and their matching MS2 fragments. Notably, this dimension does not increase the time needed to analyze a sample and therefore enables the identification of more peptides (and thus proteins) per unit time. Using their method, termed scanning SWATH (sequential window acquisition of all theoretical fragment ion spectra), the authors analyzed a human cell line and report identifying about 2,000 proteins in 30 s of active gradient and about 5,000 proteins in 5 min of active gradient. The method also enhanced quantification precision compared with traditional SWATH, as shown by reduced variability between replicate measurements.

The authors demonstrated the method’s capabilities with two use cases. First, they analyzed the proteomes of budding yeast treated with cytostatic and antifungal drugs. With 7.3 min total analysis time per sample, they analyzed quadruplicates of 16 different drug treatments in mere hours. The data revealed class-dependent changes in protein abundance, suggesting which enzymes are affected by each class of drug. The power of scanning SWATH was also demonstrated by the analysis of plasma samples from 30 patients with COVID-19. The approach quantified 43 biomarkers approved by the US Food and Drug Administration, and a principal component analysis of the data allowed the stratification of patients based on disease severity.

Although these capabilities are exciting and will increase the statistical power of many applications (Fig. 1), the number of identified proteins is about half of the quantifiable proteome of human cell lines3. Furthermore, the maximum gains were achieved with relatively large samples—10 μg of human cell line tryptic digest—and thus may not be applicable to all samples, such as biopsy samples obtained via fine needle aspiration. Thus, highly sensitive and comprehensive proteome quantification on the time scale of a minute remains an open challenge.

To have impact, good ideas need good and accessible implementations. Messner et al. implemented scanning SWATH with accessible hardware and software: the complexity of data acquisition is similar to that of traditional SWATH, and data analysis is performed by open-source algorithms that are made broadly available with the new version of the DIA-NN software suite6. Thus, scanning SWATH is ready for wide deployment. It makes large-scale analysis of many samples cheaper and more practical, which is needed for many applications, such as identifying genetic polymorphisms causing variation in protein abundances7.

The higher throughput afforded by scanning SWATH is an important addition to rapidly advancing MS methods and applications. It can be extended to applications beyond quantifying protein abundances, such as MS methods for quantifying protein structures, interactions and activities. Higher throughput is particularly needed for single-cell proteomics8. So far, the throughput of single-cell protein analysis has benefited from multiplexing methods allowing simultaneous analysis of many samples in parallel9. Combining multiplexing with shorter analysis time can further enhance throughput and increase robustness of relative quantification.

As the throughput and power of proteomics methods increase, so does the need for rigorous experimental designs. The throughput afforded by scanning SWATH and other recent methods10 can enable the analysis of thousands of samples within days. This quantitative change in throughput can lead to a qualitative change in the biomedical applications employing proteomic technologies (Fig. 1). Associations between genetic variants and protein abundance have already provided insights undetectable by RNA associations7. Increased throughput will afford the analysis of more samples, which in turn will increase statistical power4. Investigations that traditionally have been confined to the transcriptome can expand to incorporate proteomics data. Taking full advantage of these exciting prospects requires careful experimental designs that mitigate batch effects and other systematic artifacts7.