Main

Proteomes bridge between genotype and phenotype, and are important for both basic and data-driven biology, biotechnology and systems medicine1,2,3. Indeed, proteins account for more than two-thirds of drug targets4. Proteomes are, however, inherently complex, presenting analytical challenges5. Advances in mass spectrometry instrumentation, chromatography, acquisition methods, data analysis strategies and the introduction of ion mobility devices have increased the depth in single-shot proteome measurements6,7,8,9,10,11,12. However, the requirements of quantitative biology, precision medicine and epidemiology are driving a need to increase throughput, to improve consistency and precision, to facilitate longitudinal studies and to make data acquired across laboratories more comparable2,13,14. Larger, systematic experiments, and more complete data matrices, would also promote the application of advanced statistical methods and machine learning in the analysis of biological and medical problems15,16,17.

A central challenge is to raise throughput in proteomic measurements without compromising identification numbers, quantitative precision and data completeness. Here we report a DIA method, Scanning SWATH, and software algorithms dedicated to the analysis of highly complex samples measured with short chromatographic gradients in bottom-up proteomics. First, Scanning SWATH adds a new dimension to DIA–MS data that is derived from the use of a ‘sliding’ quadrupole (Q1), that was first introduced as part of the SONAR method18,19. In Scanning SWATH, the scanning dimension is exploited to assign precursor masses to MS/MS traces, which is not possible in conventional DIA methods. Scanning SWATH hence combines the advantages of data-dependent and -independent proteomic acquisition techniques. Second, in Scanning SWATH, Q1 scans are completed in a shorter time than the stepwise acquisition in conventional SWATH20,21 because there is no need to empty the collision cell between steps.

We demonstrate that the combination of Scanning SWATH and high-flow chromatography (800 µl min–1) generates a technology platform for ultra-fast, yet quantitatively precise, proteome experiments. This setup allows the processing of several hundred samples per day per mass spectrometer, using chromatographic gradients at a minute or even a sub-minute scale. We show that enhanced throughput does not come at the expense of quantitative precision. Indeed, we report coefficient of variation (CV) values that are equal to or better than those for other proteomic techniques of considerably lower throughput. We illustrate the application of ultra-fast proteomics for capturing the response of cytostatic and antifungal drugs (statins, azoles and antifolates) acting on yeast, for mode-of-action prediction using the proteome, in addition to compound-specific effects. Moreover, we demonstrate that proteomic gradients as fast as 60 s allow for classification of disease severity in COVID-19 patients on the basis of their plasma proteomes, and we identify a panel of known and new COVID-19 severity biomarkers.

Results

The scanning quadrupole creates a new data dimension

To date, one disadvantage of SWATH–MS over data-dependent acquisition (DDA) techniques has been a lack of precursor mass assignment to MS/MS traces20. Continuous movement of the quadrupole in Scanning SWATH results in a time dependency of the fragment signal. The signal of each MS/MS feature first appears and then disappears when the leading and trailing margins of the sliding isolation window pass the precursor mass, respectively (Fig. 1b). This scanning dimension is complementary to the retention time dimension, because it allows one to distinguish coeluting peptides with different precursor masses and to assign precursor masses to each fragment ion (Fig. 1, lower).

Fig. 1: Scanning SWATH replaces stepwise precursor selection with a continuously moving quadrupole, thereby adding another dimension to the data and shortening duty cycles.
figure 1

a, In conventional SWATH–MS/DIA–MS, a quadrupole selects a relatively wide mass range and the detector collects MS/MS spectra for a defined accumulation time. The windows are stepped and overlapping (to compensate for edge effects20). The collision cell needs to be emptied after each step. b, In Scanning SWATH, the isolation window slides over the precursor mass range and MS/MS spectra are continuously acquired. The continuous movement of the quadrupole results in a time dependency of the fragment intensity. Fragment signals appear when the leading edge of the quadrupole passes the precursor m/z and disappear when this falls out of the quadrupole isolation window. c, The acquired raw data are sectioned into bins of defined m/z size. Data from TOF pulses that overlap with a certain m/z bin are summed together and written into the respective bin (for example, all TOF pulses labeled in red on the diagram are summed together in the respective bin). Therefore, the highest signal for a fragment is in the bin that includes the respective precursor mass. In contrast to conventional SWATH, data from each TOF pulse are written into more than one bin, resulting in a Q1 profile of triangular shape. d, The Q1 profile provides a fourth dimension in Scanning SWATH data. In conventional SWATH, each fragment mass (mass dimension) has a certain intensity (intensity dimension) that is measured along the chromatographic time (retention time dimension). e, In Scanning SWATH data, each fragment also gives rise to a Q1 profile (Q1 dimension). f, Different fragments from the same precursor show correlating Q1 profiles (for example, green, orange and purple fragments). The apex of the Q1 profile corresponds to the precursor mass, and thus fragments from different precursors can be distinguished (for example, green, orange and purple fragments belong to different precursors than the pink fragment).

To make the scanning dimension exploitable, the acquired Scanning SWATH data are written into defined m/z bins by summing all time-of-flight (TOF) pulses that overlap with the respective precursor range (Fig. 1c). The resulting triangular ‘Q1 profiles’ are then mapped to m/z coordinates by calibrating on known masses and aligning the Q1 profiles with the respective MS1 mass (Methods). To exploit the Q1 profiles in proteomic experiments, we developed open-source algorithms and made them broadly accessible by including them in the open-source DIA–NN software suite22. DIA–NN makes use of the scanning dimension by calculating 15 scores that assess (1) the similarity of the Q1 profiles of the fragments and the nonfragmented precursor (2) the Q1 profile shapes, and (3) the relation of the centroided Q1 profile and the expected precursor mass (Supplementary Fig. 1 and Methods). These scores are then analyzed by a deep neural network (NN) classifier used in DIA–NN22, to assign confidence scores to peptide–spectrum matches and thus obtain q values (Supplementary Fig. 1).

To test to what extent the use of the Q1 dimension improves true precursor identification from complex DIA spectra, we separated 10 µg of a trypsin-digested human cell line (K562) on a 5-min, high-flow LC gradient introduced recently as part of a high-throughput proteomics workflow23. We used a 10-m/z Scanning SWATH window size because this yields a good compromise between proteomic depth and quantification precision (Methods and Supplementary Fig. 2a). Moreover, we experimentally determined the false discovery rate (FDR) using a two-species spectral library method22,24 (Methods). In the 5-min gradient, Scanning SWATH identified 70% more true-positive precursors at 1% FDR than an optimized, conventional SWATH acquisition method23 run on the same chromatographic gradient and mass spectrometer (Fig. 2a). To illustrate the impact of the additional dimension, we highlight the true-positive target human precursor, AVVIVDDR(2+), for which the apex of the Q1 profile matches the mass of the precursor in the library and thus increases the confidence in this particular identification (Fig. 2b, left). On the other hand, the apex of the Q1 profiles corresponding to the extracted fragment masses of a false target (Arabidopsis thaliana precursor, FDGALNVDVTEFQTNLVPYPR(3+)) does not match the respective precursor mass (Fig. 2b, right). Therefore, this particular false target had a reported q > 0.01 (not identified) when analyzed with Scanning SWATH, but was incorrectly called as true-positive using conventional SWATH (reported q < 0.01). Thus, use of Q1 profiles facilitates a better distinction of true targets from interferences.

Fig. 2: Scanning SWATH improves peptide identification in short gradients.
figure 2

a, A human cell line digest (10 µg) was acquired with both the Scanning SWATH method (10-m/z window) and a conventional stepped SWATH method23 using a 5-min, high-flow (800 µl min–1) LC gradient. For experimental validation of FDR, the data were analyzed with DIA–NN using a two-species library8,22 containing human and A. thaliana precursors (Methods). b, Left: Q1 profile of fragments corresponding to a true-positive target precursor (human) with a mass of 443.8 m/z (AVVIVDDR(2+)). Right: Q1 profile of fragments corresponding to a false target (A. thaliana precursor) with a precursor mass of 799.1 m/z (FDGALNVDVTEFQTNLVPYPR(3+)). c, The number of protein groups identified (1% FDR) in a K562 cell lysate with Scanning SWATH and conventional stepped SWATH, using 5-, 3-, 1- and 0.5-min chromatographic gradients and adjusted duty cycles (Supplementary Tables 13). d, Numbers of precursors (peptides ionized to a specific charge) and peptides (stripped sequences) identified (1% FDR) in human cell lysates measured with different acquisition schemes and platforms. A K562 digest was analyzed with 5-min sSWATH, 1-min sSWATH and 5 min SWATH; 10 µg was injected for the 5-min gradient and 5 µg for the 1-min gradient. To put the results into context, we compared them to a publicly available 5-min-gradient human cell line (HeLa) DIA dataset as recorded with an Evosep One system coupled to an Orbitrap Exploris 480 with FAIMS (5-min DIA–FAIMS) (PXD016662)27. Project-specific libraries and the same software settings were used for raw data analysis (Methods). Data are presented as mean ± s.d. (n = 3 replicate injections). e, Number of protein groups (1% FDR) with at least one or two peptide identifications, respectively. Data are presented as mean ± s.d. (n = 3 replicate injections). f, Number of precursors (left) and protein groups (right) quantified with CV < 20% and CV < 10% in triplicate injections. CV values were calculated from n = 3 replicate injections.

Scanning SWATH records precise proteomes with short gradients

To increase the throughput capacity in proteomics, we recently introduced protocols that use high-flow LC with flow rates of several hundred µl min–1, and with gradient lengths of 5 min and faster23. With overheads of only 3 min between injections, including all washing and equilibration steps, 5-min, high-flow LC gradients facilitate a proteomic throughput of 180 samples per day23. We further showed that high-flow chromatography also offers other benefits to proteomic experimentation: it provides high peak capacities, reduces carryover and improves longitudinal chromatographic and electrospray stability, factors that, in turn, lead to increased quantification precision and data completeness23.

To test the extent to which Scanning SWATH facilitates proteome experiments with fast, high-flow chromatographic gradients, we ran linear gradients of length 5, 3, 1 and 0.5 min at a flow rate of 800 µl min–1. We acquired the data with correspondingly adjusted duty cycles (Methods) and injected 5 µg of K562 tryptic digests. With Scanning SWATH and cycle times as short as 280 ms (30-s gradient), the method recorded on average more than three data points per peak at full-width at half-maximum (FWHM), which corresponds to more than five points per peak width at base (1.7 × FWHM, ref. 25), sufficient for precise quantification in DIA–MS experiments11,26. The 30-s gradient quantified 1,937 protein groups at 1% FDR. The identification numbers increased to 2,720 protein groups with a 60-s chromatographic gradient (cycle time of 310 ms), and to 4,470 protein groups over 5 min (cycle time of 520 ms) when 5 µg of human cell lysate digest was injected (Fig. 2c).

To determine dependency on the amount of sample injected, we measured a dilution series of K562 tryptic digests ranging from 10 µg to 250 ng with 5-min conventional SWATH and 5-minute Scanning SWATH (5-min sSWATH). Scanning SWATH shows overall higher ID numbers, even at lower injection levels (Supplementary Fig. 3a,b). We further measured a dilution series with microflow LC (5 µl min–1 and 20-min gradients). Here, Scanning SWATH identified 4,896 human protein groups from 1.25 µg of tryptic digest (Supplementary Fig. 3c).

To put the performance of Scanning SWATH into context, we illustrate a benchmark in which we compare 5-min sSWATH with both 1-min Scanning SWATH (1-min sSWATH) and 5-min conventional stepped SWATH (5-min SWATH) on the same LC–MS setup. This benchmark confirmed an increase of 70% in precursor identifications (peptides ionized to a specific charge) compared to stepped SWATH on the same instrument setup on the 5-min, high-flow rate gradient (46,009 versus 26,795) (Fig. 2d). We further compared these results with a reference dataset recorded with an Orbitrap Exploris 480 instrument (Thermo) with field asymmetric ion mobility spectrometry (FAIMS) interface and 5-min separations on an Evosep One LC system (5-min DIA–FAIMS)27. When analyzed with the same software and software settings to render the results comparable, the Evosep 5-min DIA–FAIMS study quantified 15,576 precursors, fewer than the Scanning SWATH precursor identifications in 20% of the gradient time (60-s experiment, 19,401 precursors) (Fig. 2d). The Scanning SWATH experiment identified 5,004 protein groups (corresponding to 4,394 unique proteins, detected with proteotypic peptides), while conventional stepped SWATH and DIA–FAIMS identified, with a comparable gradient length, 3,568 and 3,594 protein groups, respectively (Fig. 2e). Out of these, 3,962, 2,753 and 2,380 protein groups were identified with at least two peptides in 5-min sSWATH, 5-min conventional SWATH and 5-min DIA–FAIMS, respectively.

Scanning SWATH also improves quantification precision, expressed as median CV values of 6.4 and 8.8% for all protein groups and precursors quantified, respectively. In comparison of the same set of protein groups quantified in both Scanning and conventional SWATH runs, the former yielded median CV values of 4.1% compared to 4.9% for the latter (Supplementary Fig. 2b,c shows protein groups and precursors, respectively). In absolute numbers, Scanning SWATH quantified 4,317 protein groups (out of 5,004 identified) with CV < 20% and 3,308 with CV < 10%, while conventional SWATH quantified 3,208 with CV < 20% and 2,527 with CV < 10% (Fig. 2f; left, precursors; right, protein groups).

The quantification precision obtained with Evosep DIA–FAIMS using 5-min gradients in the reference dataset27 is substantially low compared to the performance of Scanning SWATH (Supplementary Fig. 2b,c shows protein groups and precursors). However, we would like to highlight caveats in regard to the direct comparison of quantification and identification performance of the high-flow-rate LC Scanning SWATH to the Evosep/Exploris DIA data27. While conventional SWATH and Scanning SWATH experiments are directly comparable (only the acquisition method differs), the Evosep/Exploris and high-flow quadrupole time-of-flight (qTOF) platform differ in multiple design parameters so that the acquisitions differed not only in flow rate and scan mode but also in precursor range, cycle time (1 s for Exploris) and in the fact that the Evosep system uses a separate solid-phase extraction tip for each sample23,27,28. Furthermore, the FAIMS–DIA reference data used a noncommercial mammalian cell line (HeLa) tryptic digest generated with an in-house protocol. We have no influence on the design elements, but we note that these affect both depth and quantification precision and, therefore, that direct comparison is compromised by the fact that the platforms differ in multiple parameters.

Scanning SWATH captures drug responses

If proteomics can be run at high throughput, it has the potential to be used in drug screens as a comprehensive phenotypic readout. Proteomic screening strategies are particularly attractive for those medical applications where classic screening approaches have yielded only a few hits, such as in the development of antifungals29,30. Despite 1.6 million people dying each year from invasive fungal infections, immense screening efforts to date have yielded only three classes of clinically applicable antifungals (azoles, echinocandins and polyenes)31,32. To test whether a high-throughput proteomic approach could be used for exploratory drug screening, we applied high-flow Scanning SWATH to measure proteomes of the single-cellular fungus Saccharomyces cerevisiae treated with 16 different drugs from three different drug classes (antifolates, statins and azoles; Supplementary Table 4) at 10 µM working concentration, which is a typical screening concentration in large-scale experiments. These specific classes were chosen because of their clinical availability, well-characterized mode of action, documented resistance mechanisms and, considering azoles, their successful application in antifungal therapies33,34,35. Samples were measured in quadruplicate and pooled samples were repeatedly injected for quality control. Using 5-min, high-flow gradients (7.3 min including column wash and equilibration steps; Supplementary Table 5), the measurement of 103 proteomes (samples plus controls) was completed within 13.5 h. On average 1,980 unique proteins (1% FDR) (Supplementary Fig. 4a) were identified per run. The yeast proteome, which is less complex than the mammalian proteome, was thereby captured at a depth comparable to previous yeast proteomic experiments using SWATH–MS, with several times longer gradients and nano- or microflow-rate chromatography17,36. The proteins were quantified with median CV = 8% for instrument control samples (injections of the same sample across measurements) (Supplementary Fig. 4b). High-flow Scanning SWATH hence facilitated the acquisition of proteomes not only in a much shorter time, but indeed also yielded substantial gain of precision compared to previous yeast DIA proteomic studies that required weeks to months to process a similar number of samples17,37,38.

A principal component analysis (PCA) of differentially expressed proteins across each drug class revealed a clear separation of drug classes antifolate and statin/azole (Fig. 3a). Furthermore, proteomes reflect drug potency within the classes which, in turn, is reflected by the number of differentially expressed proteins (Supplementary Fig. 4c). For instance, the 10-µM treatment with atorvastatin had a much more prevalent impact on the proteome, with 287 proteins differentially expressed, compared to treatment with lovastatin that resulted in 38 differentially expressed proteins (Supplementary Fig. 4c). Furthermore, mapping the affected pathways consistently identified the mode of action related to the drug class, primarily resulting in the upregulation of associated enzymes of the inhibited pathway (Fig. 3b). For example, azoles elicit a strong upregulation of multiple proteins involved in the ergosterol biosynthesis pathway (Fig. 3d and Supplementary Fig. 4e), most notably in the azole target lanosterol 14-alpha demethylase (gene product of ERG11) itself, which is in agreement with previous findings. Consistent with this, although a response in the same pathway was also observed in statin-treated cells (Fig. 3d and Supplementary Fig. 4d), squalene monooxygenase (a gene product of ERG1), which is significantly upregulated following treatment with statins, was found to be downregulated in azoles (Fig. 3d,e). Thus, although statins and azoles target the same pathway, the proteomes reveal class-dependent signatures that permit differentiation of these two drug classes. Indeed, methotrexate induced upregulation in proteins primarily related to nucleotide, purine and pyrimidine biosynthesis—a clear and different response compared to statins/azoles, linking inhibition of dihydrofolate reductase (a gene product of DFR1) to nucleotide starvation and cessation of DNA replication39. The structurally related antifolate, pralatrexate, instead induced change in a broader range of pathways additional to nucleotide biosynthesis, despite its lower potency (Fig. 3b). Each of these drug treatments therefore shows a specific proteome response, and drugs from the same class show similar patterns40.

Fig. 3: Proteome response in drug-treated S. cerevisiae captured with 5-min sSWATH.
figure 3

ae, Prototrophic S. cerevisiae (S288c background) yeast cells were grown in minimal medium and treated with 10 µM of the indicated drug; 5 µg of peptides was injected and analyzed with Scanning SWATH and 5-min water-to-acetonitrile chromatographic gradients (800 µl min–1 flow rate). a, Separation of samples by PCA according to both drug class and potency. Proteins differentially expressed in at least one of the drug classes (compared to DMSO) were considered (two-sided t-test, adjusted P < 0.01, Benjamini–Hochberg multiple testing correction82). The quantities were log2 transformed and centered. Drugs with >20 differentially expressed proteins are shown. b, Pathway enrichment of proteomic data identified the target pathways. Pathway enrichment among differentially expressed proteins (two-sided t-test, adjusted P < 0.01, Benjamini–Hochberg multiple testing correction82) was conducted using hypergeometric testing. c, Proteome responses are drug class specific. Differentially expressed proteins in at least one drug class are illustrated as a heatmap. Clustering was performed row-wise but not column-wise. Drugs with >20 differentially expressed proteins are shown. d, Differential protein expression varies by drug class, and identifies the targeted pathways for azoles (left) and statins (right). Significance (–log10(adjusted P value)) was calculated with a two-sided t-test and is plotted as a function of log2 fold changes (ratio of expression levels in drug- and DMSO-treated cells). Proteins in the cholesterol pathway with adjusted P < 0.01 are highlighted and labeled with the respective gene name. The Benjamini–Hochberg procedure was used for multiple testing correction82. e, Treatment with azoles and statins resulted in down- and upregulation of squalene monooxygenase (gene product of ERG1), respectively. Expression levels are shown as fold changes (ratio of expression levels in drug- and DMSO-treated cells). Boxes show the first and third quartiles as well as the median (thick line), and the whiskers extend to the most extreme data point that is no more than 1.5× the interquartile range from the box. n = 5 azoles, n = 7 statins. TCA, tricarboxylic acid.

COVID-19 severity classification with 1-min gradients

In addition to drug-screening approaches, clinical proteomics is a major application for fast proteomic methods. Applied to blood plasma or serum samples, proteomics can (1) classify patients, (2) identify biomarkers and (3) provide molecular signatures for diagnostic and prognostic models23,24,41,42. Fast proteomic methods are specifically relevant during pandemics because they can rapidly assess pathomechanisms in an unbiased fashion23,43,44,45,46. To demonstrate the application of ultra-fast proteomic methods for plasma proteomics, we processed nondepleted citrate plasma samples from a cohort of 30 patients with COVID-19 hospitalized at Charité Universitätsmedizin Berlin between 1 and 26 March 2020 (ref. 47), and compared their plasma proteomes to those of 15 healthy individuals. The cohort was relatively well balanced between individuals suffering from COVID-19 with different levels of severity (Supplementary Table 6 and Fig. 4a), graded according to the WHO ordinal outcome scale of clinical improvement (score 3, hospitalized, no oxygen therapy; score 4, oxygen by mask or nasal prongs; score 5, noninvasive ventilation or high-flow oxygen; score 6, intubation and mechanical ventilation; score 7, ventilation and additional organ support (pressors, renal replacement therapy, extracorporeal membrane oxygenation))48. We analyzed citrate plasma samples using Scanning SWATH in conjunction with 60-s, water-to-acetonitrile chromatographic gradients (3.5 min total run time; Supplementary Table 7). We identified on average around 2,600 precursors (Supplementary Fig. 5a) corresponding to 180 protein groups (Supplementary Fig. 5b). The achieved proteomic depth is hence comparable to precursor identification numbers we previously acquired with fivefold longer LC gradients and conventional SWATH23. Forty-seven of the quantified proteins are approved biomarkers by the Food & Drug Administration (FDA)49, indicating the information richness of the highly abundant plasma protein fraction (Supplementary Table 8). Median protein CV values were 4.4 and 6.6% for instrument quality control (QC) (repeat injection of a pooled digest) and process QC (separately prepared digests from the same commercial plasma sample), respectively (Supplementary Fig. 5c). This indicates that short measurement times do not compromise quantification precision. In parallel, we compared 1-min sSWATH with a conventional SWATH method that uses 5-min chromatographic gradients23. The same cohort was measured with both methods on two different qTOF instruments, and we compared the relative abundance changes measured. Although both methods resulted in similar quantitative changes, Scanning SWATH achieved this with a fivefold shorter gradient than the conventional SWATH method (Fig. 4d for selected proteins, Supplementary Fig. 5d for all proteins). We further compared relative abundance changes of the different precursors (most abundant and second most abundant) from the same protein across all COVID-19 plasma samples (Supplementary Fig. 5e). Although not all peptides derived from the same protein were expected to correlate with one another (due to differential post-translational modification)37,50,51,52, we obtained a linear correlation for the majority of peptide quantities that were assigned to the same protein (Supplementary Fig. 5e).

Fig. 4: Scanning SWATH and 1-min gradients identify biomarkers that classify patients with COVID-19.
figure 4

a, Plasma samples were taken from 30 patients hospitalized with COVID-19 of varying severity, and from 15 healthy individuals. b, Plasma proteomes classified patients with COVID-19 according to severity. Centered and standardized quantities (z-scores) for 54 proteins that are significantly differentially expressed depending on COVID-19 severity are illustrated on a heatmap (Kendall’s Tau test for the Theil–Sen trend estimator, adjusted P < 0.01, Benjamini–Hochberg for multiple testing82). Clustering was performed row-wise but not column-wise. Labels indicate corresponding gene names. c, PCA separated patients according to disease severity. Proteins found significantly differentially expressed depending on severity were considered. d, The 1-minute sSWATH method gives quantities similar to conventional SWATH but with fivefold shorter gradients. Boxplots are shown comparing 5-min conventional SWATH23 with 1-minute sSWATH in quantification of COVID-19 severity biomarkers as a function of COVID-19 severity. Plots are labeled with gene names that encode the respective proteins: CFI (Complement factor I), GSN (Gelsolin) and ITIH4 (Inter-alpha-trypsin inhibitor heavy chain H4). Intensities were normalized to the mean value of each protein. n = 15 healthy patients, n = 5 patients with mild disease, n = 4 patients with severe disease, n = 8 critical patients. e, COVID-19 severity biomarkers that, to our knowledge, have not previously been associated with COVID-19 severity by proteomics. Plots are labeled with gene names that encode the respective proteins: A2M (Alpha-2-macroglobulin), C1QC (Complement C1q subcomponent subunit C), HPX (Hemopexin), IGHG2 (Immunoglobulin heavy constant gamma 2), IGKV4-1 (Immunoglobulin kappa variable 4–1), PON1 (Serum paraoxonase/arylesterase 1), PROS1 (Vitamin K-dependent protein S), SERPINA7 (Thyroxine-binding globulin), SERPINF2 (Alpha-2-antiplasmin), TMEM198 (Transmembrane protein 198) and TTR (Transthyretin). Protein quantities (log2 transformed) are plotted as a function of COVID-19 severity. n = 15 healthy patients, n = 10 patients with mild disease, n = 7 patients with severe disease, n = 13 critical patients (Supplementary Table 6, clinical category). The boxes in d,e show the first and third quartiles, the median (middle) and the whiskers extending to the most extreme data point that is no more than 1.5× the interquartile range from the box.

In total, 54 of the quantified proteins were significantly up- or downregulated (adjusted P < 0.01) depending on COVID-19 disease severity (Supplementary Table 9). These characterize the host response to the infection, as illustrated by a heatmap grouping patients/individuals according to disease severity (WHO score; Fig. 4b). Additionally, PCA allows separation of patients according to disease severity in the first PC, indicating that the 1-min sSWATH runs capture clinical classifiers for COVID-19 severity. Out of the 54 significantly changed proteins, 43 have previously been related to COVID-19 severity23,43,45. We further identified 11 proteins that are changing significantly and, to our knowledge, have not previously been associated with COVID-19 severity (Fig. 4e). Several of these belong to the acute phase response and the complement cascade, which are involved in the antiviral host response. For example, we detected upregulation of both alpha-2-macroglobulin (gene product of A2M) and alpha-2-antiplasmin (gene product of SERPINF2) and downregulation of vitamin K-dependent protein S (gene product of PROS1), highlighting the role of coagulation in COVID-19 infections. Alpha-2-macroglobulin and alpha-2-antiplasmin are protease inhibitors that inactivate thrombin53 and plasmin54, respectively, while vitamin K-dependent protein S is an anticoagulate plasma protein. Moreover, we detected upregulation of complement C1q subcomponent subunit C (gene product of C1QC) and downregulation of immunoglobulin kappa variable 4–1 (gene product of IGKV4-1), both involved in activation of the classical complement pathway and thus the innate immune response. We found downregulation of thyroxine-binding globulin (gene product of SERPINA7) and transthyretin (gene product of TTR), both involved in binding and transport of thyroid hormones, which is consistent with previous studies that found associations of thyroid dysfunction with the severity of COVID-19 infection55,56. Further, we observed downregulation of serum paraoxonase/arylesterase 1 (gene product of PON1), which is associated with the cholesterol-carrying, high-density lipoprotein (HDL), a modulator of innate immune response and inflammation57,58,59,60. This agrees with previous studies that related downregulation of serum paraoxonase/arylesterase 1 with other inflammatory and infectious diseases61,62. Equally, dysregulation of HDL-related proteins (such as apolipoproteins) in patients with severe COVID-19 has previously been reported23,43. Further, hemopexin (gene product of HPX), a heme-binding and transporting protein, is downregulated. This could be due to its role in iron homeostasis, which is known to play a role in viral infections63,64,65. Proteome experiments using LC gradients as fast as 60 s are thus able to capture known and novel information in major infectious disease, including COVID-19, while allowing the measurement of hundreds of thousands of samples.

Discussion

Bottom-up proteomics has become popular, in part because it substantially increases the number of proteins that can be studied in parallel in biological samples7,8,66,67. More recently, the proteomic field has sought to increase throughput and data robustness. High-throughput proteomics benefits from recent developments in sample preparation, chromatography, data acquisition and data analysis. Automation and sample processing based on 96-well plates allow the preparation of hundreds of samples per day and reduce batch effects that limit large-scale and longitudinal experiments23,67,68,69,70,71,72. Fast, efficient and robust chromatographic separations have been achieved by replacing traditional nanoflow LC73,74 with setups that use higher flow rates. This ranges from microflow LC systems (5–50 µl min–1)24,38,75 to LC devices with preformed gradients27,76. More recently we introduced proteome experiments that make use of high-flow LC (800 µl min–1). In 5-min chromatographic gradients, these allow up to 180 proteome injections day–1 on a single LC–MS instrument while increasing robustness, cost effectiveness and quantification precision in longitudinal proteome experiments23. The development of algorithms to deconvolute complex spectra resulting from fast chromatographic measurements is ongoing, but several major steps have recently been achieved and have increased both proteomic depth and quantification precision, in conjunction with fast chromatographic methods22,77,78,79.

Missing so far have been MS acquisition modes specifically designed for the challenges of complex samples analyzed over a short period of time. Here we demonstrate that the precise acquisition of proteomes in short gradients is facilitated by Scanning SWATH. This method, which requires a fast-scanning qTOF but no proprietary reagents, adds an additional scanning dimension to the raw data that increases depth and true-positive precursor identification. Scanning SWATH further shortens MS duty cycles and allows narrow precursor isolation windows. Scanning SWATH hence brings the requirement for ‘deep’ proteomes and high throughput closer together.

We have benchmarked the platform’s identification and quantification performance on a human cell line digest that is commercially available (Promega) and thus facilitates comparability across laboratories. Further, we show the application of this technology in yeast antifungal drug screens, and in classification of patients with COVID-19 based on plasma samples. In human cell lysates we achieved quantification of 1,937 protein groups in conjunction with a chromatographic gradient as fast as 30 s, and show that, with gradients of 1–5 min, at least 70% more precursors are quantified compared to previous methods. Despite this high throughput, quantification precision is comparable to, if not higher than, the most recent achievements in human and yeast samples, even if similar chromatographic setups, sample preparation and instruments are used17,23,38. Indeed, achieving a median CV of 6.4% in quantification of protein groups in a cell lysate indicates that, despite the high throughput, the combination of high-flow chromatography and Scanning SWATH is among the most precise proteomic methods currently available.

The ideal balance between throughput and proteomic depth in a proteomic experiment is determined by the scientific question asked. Scanning SWATH has benefits for both: it facilitates faster chromatographic gradients and better measurement precision in high-throughput applications, but it also improves proteomic depth with longer gradients. With a 30-min gradient on our high-flow system, conventional stepped SWATH and Scanning SWATH identifird 5,958 and 6,564 protein groups, respectively (Supplementary Fig. 6). Scanning SWATH is most advantageous over existing methods in conjunction with fast chromatographic gradients and in the analysis of complex samples, wherein the assignment of precursor masses to MS/MS traces has major benefits in improving true-positive precursor identification. Scanning SWATH combined with 5-min LC gradients measures protein group intensities across four orders of magnitude (Supplementary Fig. 7a), and intensity values are in agreement with those obtained using tenfold longer gradients (Supplementary Fig. 7b). This setup allows the measurement of several hundreds of proteomes per day on a single LC–MS instrument23.

The optimal injection amount is less a consequence of the Scanning SWATH method, but depends on the applied chromatographic flow rate. The amount of sample injected for 5-min gradient methods was 5–10 µg, which is an accessible amount with conventional digestion protocols17,23,80,81. For instance, the digestion of just 5 µl of blood plasma would allow ten injections on the high-flow LC system.

We also note that, in contrast to conventional SWATH where different precursor isotopologs might fall into different windows, Scanning SWATH can preserve the isotopic patterns of fragments because there are no ‘quadrupole edges’ in Scanning SWATH raw data. This may have future applications. For instance, one could exploit this feature to improve stable isotope labeling by amino acids in cell culture (SILAC) experiments, where it has been noted before that in conventional SWATH experiments problems might occur when one of the precursor distributions (for example, light) is split between two windows but the other (for example, heavy) is not20.

Methods

Materials

Water (LC–MS grade, Optima, no. 10509404), acetonitrile (LC–MS grade, Optima, no. 10001334), methanol (LC–MS grade, Optima, no. A456-212) and formic acid (LC–MS grade, Thermo Scientific Pierce, no. 13454279) were purchased from Fisher Chemicals. Human cell lysate (MS Compatible Human Protein Extract, Digest, no. V6951) and trypsin (Sequence grade, no. V511X) were purchased from Promega. DL-dithiothreitol (BioUltra, no. 43815), iodoacetamide (BioUltra, no. I1149) ammonium bicarbonate (eluent additive for LC–MS, no. 40867), yeast nitrogen base without amino acids (no. Y0626) and glass beads (acid washed, 425–600 µm, Sigma, no. G8772) were purchased from Sigma-Aldrich. Urea (puriss. P.a., reag. Ph. Eur., no. 33247H) and acetic acid (eluent additive for LC–MS, no. 49199) were purchased from Honeywell Research Chemicals. Rosuvastatin calcium (no. S2169), fluvastatin sodium (no. S1909), pyrimethamine (no. S2006), pitavastatin calcium (no. S1759), pemetrexed disodium hydrate (no. S7785), pravastatin sodium (no. S3036), clotrimazole (no. S1606), miconazole nitrate (no. S1956), lovastatin (no. S2061), ketoconazole (no. S1353), atorvastatin (no. S2077), methotrexate disodium (no. S5097), simvastatin (no. S1792), uniconazole (no. S3660) and itraconazole (no. S2476) were purchased from Selleck Chemicals, and pralatrexate (no. A4350) was purchased from APExBIO. Control samples for the SARS-CoV-2 study were prepared from commercial human plasma (EDTA, Pooled Donor, Genetex, no. GTX73265).

Clinical samples from patients with COVID-19

Sampling was performed as part of the Pa-COVID-19 study, a prospective observational cohort study assessing the pathophysiology and clinical characteristics of patients with COVID-19 at Charité Universitätsmedizin Berlin47. All patients with SARS-CoV-2 infection as proven by positive PCR from respiratory specimens and willing to provide written informed consent were eligible for inclusion. Exclusion criteria were refusal to participate in the clinical study by the patient or their legal representative, or clinical conditions that do not allow for blood sampling. The study assessed epidemiological and demographic parameters, medical history, clinical course, morbidity and quality of life during the hospital stay of patients with COVID-19. Serial, high-quality biosampling consisting of various sample types with deep molecular, immunological and virological phenotyping was performed. Treatment and medical interventions followed the standard of care as recommended by current international and German guidelines for COVID-19. The severity of illness in the present study follows the WHO ordinal outcome scale48. The Pa-COVID-19 study was carried out according to the Declaration of Helsinki and the principles of Good Clinical Practice (International Conference on Harmonization 1996) where applicable, and was approved by the ethics committee of Charité Universitätsmedizin Berlin (no. EA2/066/20).

Sample preparation

The human cell lysate was obtained commercially (Promega) and was dissolved in 0.1% formic acid. Plasma samples were prepared as previously described23.

The yeast samples for drug response measurements were prepared and digested as follows: The auxotrophic S. cerevisiae strain BY4741 (∆his3, ∆leu2, ∆ura3, ∆met15) was rendered prototrophic by genomic knock-in of the missing genes. This prototrophic, wild-type strain was grown on agar plates containing synthetic minimal medium for 3 days. Subsequently, colonies were inoculated in synthetic minimal liquid medium (25 ml) and incubated at 30 °C for 1 day. The yeast culture was transferred to 96-deep-well plates and drugs were added to achieve a working concentration of 10 μM (1 ml total volume per well). The yeast culture was incubated at 30 °C and was grown overnight to exponential phase. Cells were pelleted by centrifugation at 3,220 relative centrifugal force for 5 min, the supernatant was discarded and plates were stored at −80 °C until further processing.

200 μl 0.1 M ammonium bicarbonate in 7 M urea and glass beads (~100 mg per well) were added to the frozen pellet. Subsequently, the plates were sealed (Cap mats, Spex, no. 2201) and lysed in a bead beater for 5 min at 1,500 r.p.m. (Spex Geno/Grinder). After 1 min of centrifugation at 4,000 r.p.m., 20 μl of 55-mM DL-dithiothreitol was added (final concentration 5 mM) with mixing, and the samples were incubated for 1 h at 30 °C. Subsequently, 20 μl of 120 mM iodoacetamide was added (final concentration 10 mM) and incubated for 30 min in the dark at room temperature. One milliliter of 100-mM ammonium bicarbonate was added, centrifuged for 3 min at 4,000 r.p.m. and 230 μl was transferred to prefilled trypsin plates (9 μl of 0.1 μg μl–1 trypsin). After incubation of the samples for 17 h at 37 °C, 24 μl of 10% formic acid was added. The digestion mixtures were cleaned using C18 96-well plates (96-Well MACROSpin C18, 50–450 μl, The Nest Group, no. SNS SS18VL). For solid-phase extraction, 1-min centrifugation steps at the described speeds (Eppendorf Centrifuge 5810 R) were applied to force liquids through the stationary phase. A liquid handler (Biomek NXP) was used to pipette the liquids onto the material to facilitate four 96-well plates per batch. The plates were conditioned with methanol (200 μl, centrifuged at 50g), washed twice with 50% acetonitrile (ACN, 200 μl, centrifuged at 50g and flow-through discarded) and equilibrated three times with 3% ACN and 0.1% formic acid (200 μl, centrifuged at 50/80 and 100g, respectively, and flow-through discarded). Then, 200 μl of digested samples was loaded (centrifuged at 100g) and washed three times with 3% ACN and 0.1% formic acid (200 μl, centrifuged at 100g). After the last washing step, the plates were centrifuged once more at 180g before elution of peptides in three steps, twice with 120 μl and once with 130 μl of 50% ACN (180 g), into a collection plate (1.1 ml, square well, V-bottom). The collected material was completely dried on a vacuum concentrator (Eppendorf Concentrator Plus) and redissolved in 40 μl of 3% ACN and 0.1% formic acid before transfer to a 96-well plate (700 μl round, Waters, no. 186005837). QC samples for repeat injections were prepared by pooling 3 μl of each digested sample. All pipetting was done with a liquid handling robot (Biomek NXP automated liquid handler), shaking was performed with a thermomixer (Eppendorf Thermomixer C) after each step and, for incubation, a Memmert IPP55 incubator was used.

LC–MS

Liquid chromatography was performed on an Agilent Infinity II ultra-high-pressure system coupled to a Sciex TripleTOF 6600. Peptides were separated in reversed-phase mode using an InfinityLab Poroshell 120 EC-C18 at a column temperature of 30 °C. The dimensions of the columns were 2.1 mm internal diameter, 30 mm length and 1.9-μm particle size for the yeast drug screen, and 2.1 mm internal diameter, 50 mm length and 1.9-μm particle size for all other measurements. For K562 benchmarks, a gradient was applied that ramps from 3 to 36% buffer B in 5 min (buffer A: 1% acetonitrile and 0.1% formic acid; buffer B: acetonitrile and 0.1% formic acid) with a flow rate of 800 µl min–1. For washing the column, the flow rate was increased to 1 ml min–1 and the organic solvent was increased to 80% buffer B in 0.5 min, and was maintained for 0.2 min at this composition before reverting to 3% buffer B in 0.1 min. Subsequently the column was equilibrated for 2.1 min (Supplementary Table 10). An IonDrive Turbo V Source was used with ion source gas 1 (nebulizer gas), ion source gas 2 (heater gas) and curtain gas set to 50 psi, 40 psi and 25 psi, respectively. The source temperature was set to 450 °C and the ion spray voltage to 5,500 V.

For comparison of different gradient lengths (0.5, 1, 3 and 5 min), we applied linear gradients ramping from 3 to 36% buffer B (buffer A: 1% acetonitrile and 0.1% formic acid; buffer B: acetonitrile and 0.1% formic acid) with a flow rate of 800 µl min–1. For Scanning SWATH and conventional stepped SWATH the duty cycles were adjusted accordingly (Supplementary Tables 1 and 3). For conventional SWATH this was done by adjusting the number of variable windows to reach cycle times comparable to Scanning SWATH duty cycles (Supplementary Tables 2 and 3). For this particular comparison, the accumulation times of MS1 and MS/MS scans were 10 and 25 ms, respectively.

The 1-min gradients used for the measurement of patient samples were slightly adjusted: 3 µg of the digested proteins was injected and we applied linear ramping from 3 to 15% buffer B (buffer A: 1% acetonitrile and 0.1% formic acid; buffer B: acetonitrile and 0.1% formic acid) in 0.1 min, followed by linear ramping from 15 to 40% buffer B in 0.9 min (Supplementary Table 7 shows detailed gradient parameters).

For the yeast drug screen we reduced the column length to 3 cm (InfinityLab Poroshell 120 EC-C18, 2.1 mm internal diameter, 30 mm length and 1.9-μm particle size) and increased the flow rate during the column wash to 2.3 ml min–1, which reduced the method overhead time to 140 s (Supplementary Table 5 shows detailed gradient parameters).

Scanning SWATH settings, operation and calibration

The Scanning SWATH runs were acquired with a Scanning SWATH beta version. If not mentioned otherwise, the following settings were applied in the Scanning SWATH runs: the precursor isolation window was set to 10 m/z and a mass range of 400–900 m/z was covered in 0.5 s. These settings provided a compromise between identification and quantification performance. We optimized the window size on yeast (S. cerevisiae) whole-proteome tryptic digests and a 5-min, high-flow, water-to-acetonitrile gradient where we tested window sizes ranging from 3 to 20 m/z, covering a precursor range of 400–900 m/z. The best results in terms of identification and quantitative precision were achieved with a window size of 10 m/z (Supplementary Fig. 2a). Reducing the window size further would have resulted in even higher identification numbers due to reduced interference, but the resulting shorter effective accumulation times would have lowered quantitative precision. Raw data were binned in the quadrupole or precursor dimension into 2-m/z bins, providing a resolution in the Q1 dimension that allowed the effective use of Q1 scores. The MS1 scan was omitted for the benchmarks, and data were acquired in high-sensitivity mode.

The instrument control software calculates an radio frequency/direct current (RF/DC) ramp that was applied to quadrupole filter 1. The ramp is calculated from the experimental start transmission mass, stop transmission mass, transmission width and cycle time. The calculation uses previously acquired calibrations to calculate ramps for mass DACS and resolution DACS. The quadrupole start mass is calculated as experiment start mass minus transmission width, and the quadrupole stop mass as experiment stop mass plus transmission width. This allows for correct precursor profiles of all fragments at the boundaries of the experimental mass range. Collision energy is calculated using the +2 Rolling Collision energy equation based on the center masses for each transmission window. This results in a small collision energy spread depending on the width of the transmission window relative to the range being scanned. In these experiments the effect is typical around a spread of 1 eV for a given precursor.

The instrument acquisition software organizes ion detection responses into calculated 2-m/z precursor isolation bins given the current TOF pusher pulse number relative to the start of the scan applying the Scanning SWATH offset curve described above. The 2-m/z precursor isolation bins are organized in the data file as adjacent experiments, allowing for the extraction of precursor profiles for any given fragment in a given cycle by tracing fragment response across experiments, as well as normal chromatographic profiles across cycles.

The bins-to-sum (consolidation of data points in time-of-flight dimension) was set to 4 (4 × 25 picoseconds (ps) = 100 ps) for the K562 benchmark experiment, and to 8 (8 × 25 ps = 200 ps) for all other experiments.

Scanning SWATH calibration was obtained while processing each sample file from the sample data. An automated algorithm finds the maximum residual precursors for each transmission window across the entire sample. This results in several accurate mass TOF measurements paired with the centroid of the quadrupole mass traces per quadrupole transmission region, of which there are usually ten or more per 100 dalton (Da). For example, if the algorithm used the three best residual precursors across the LC for a given transmission region and the scan range was 500 Da with a transmission width of 10, there would be 500/10 × 3 = 150 calibration point pairs consisting of quadrupole mass and TOF accurate mass. Since it is possible that an intense peak within the quadrupole transmission region is not in fact a residual precursor, a selection algorithm filters out points using an outlier rejection algorithm that considers local variance. Typically a point is evaluated relative to its neighbors in a 50–100-Da region. Once a multipoint calibration curve is obtained, the calibration is applied to the data by updating the begin and end mass region defined in the header from each experiment stored, such that the center is calculated from the calibration function while maintaining continuity of boundaries in adjacent experiments.

Matching precursors to MS/MS fragment traces in DIA–NN

The DIA–NN method takes full advantage of the fourth dimension in Scanning SWATH data. In DIA–NN, a set of scores is calculated for each precursor–spectrum match (PSM), to distinguish true signals from noise using linear classifiers and an ensemble of deep NNs. DIA–NN also selects the ‘best’ fragment ion per PSM, as the one with the clearest signal, with other fragment ions then being assessed by comparing their MS2 traces to those of the best fragment22. Scores specifically related to Q1 profile assessment have now been added to DIA–NN algorithms. The Q1 profiles are extracted at the apex of the respective elution peak and the following scores are calculated. (1) Those that reflect the similarity of the Q1 profiles of the fragments and the nonfragmented precursor to the Q1 profile of the best fragment. One score is calculated as the sum of correlations between Q1 profiles of the fragments and the Q1 profile of the best fragment, as designated by DIA–NN during candidate elution peak identification22. The other score is the correlation of the Q1 profile of the nonfragmented precursor and the Q1 profile of the best fragment. (2) Scores that reflect how well Q1 profile shapes match the expected triangular shape. For each fragment, a score is calculated with values between 0 and 1, reflecting whether its Q1 profile increases monotonically to the left from the apex. These scores are then multiplied by the correlation between elution profiles of the fragments and the best fragment, and summed across all the fragments. A similar sum is calculated reflecting whether Q1 profiles are monotonically decreasing to the right from the apex. (3) The difference between the centroid of the Q1 profile of the best fragment and the library precursor mass is calculated. DIA–NN calculates the scores listed in (1) – (3) at three different scales by using the three, seven or 11 bins closest to the Q1 profile apex, respectively, yielding 3 × (2 + 2 + 1) = 15 scores in total (Supplementary Fig. 1 gives further details on the algorithm).

Only the monoisotopic fragment masses are used for Q1 profile assessment because the Q1 profiles of different fragment isotopologs are shifted relative to each other. We illustrate this for a doubly charged precursor (Supplementary Fig. 8). As one would expect, the Q1 profiles of the +1 13C and the +2 13C fragment isotopologs are shifted by ~0.5 and ~1 m/z to the monoisotopic mass, respectively. Depending on precursor mass and fragment mass, a small fraction of the monoisotopic fragments might also result from a +1 13C precursor isotope. This does slightly distort the Q1 profile of monoisotopic fragments but, as this distortion/shift is in the range of the mass accuracy of the quadrupole, it and its impact are negligible in practice.

Conventional DIA and SWATH runs (for benchmark)

The conventional 5-min SWATH method is based on one previously published23. To render it comparable to the developed Scanning SWATH method, we applied the same 0.5-s duty cycle and the same precursor mass range of 400–900 m/z as in the developed Scanning SWATH method. Each duty cycle consists of one MS1 scan with 20-ms accumulation time, and 17 MS/MS scans with variable windows (Supplementary Table 11) and 25-ms accumulation time.

The DIA–FAIMS data, acquired on an Evosep One LC system coupled to an Orbitrap Exploris 480, were downloaded from ProteomeXchange (dataset PXD016662). Triplicate runs with 500-ng HeLa tryptic digests loaded on a column (the highest load in this dataset), a compensation value of −45 V for FAIMS, a resolving power of 15,000 and a cycle time of 1 s were considered because these runs provided the best identification numbers while maintaining quantitative accuracy27. DIA–FAIMS data were analyzed with a project-specific library acquired on the same setup (PXD016662; ‘5min-library.kit’). For the analysis in DIA–NN, the library was exported from Spectronaut (v.13.12.200217.43655 (Laika)) with the ‘Export Spectral Library’ function and reannotated with the ‘Reannotate’ function in DIA–NN using the UniProt83 human canonical proteome (3AUP000005640). The DIA–FAIMS data were analyzed with Spectronaut (v.13.12.200217.43655 (Laika)) and DIA–NN but, as the identification numbers were higher with DIA–NN, we used these values for the benchmark.

Data processing and analysis

Raw data processing was carried out with DIA–NN v.1.7.12 and with default settings in ‘robust LC (high accuracy)’ mode. Protein quantities were obtained using the MaxLFQ algorithm84 as implemented in either DIA–NN (yeast drug screen) or the diann R package (https://github.com/vdemichev/diann-rpackage) (all other samples).

The data processing and batch correction for patient measurements were done as described previously23. Briefly, the report was filtered at 0.01 precursor-level q-value and 0.05 protein-group-level q-value. Intrabatch correction was performed for each peptide precursor separately and based on the sample preparation controls, using linear regression on the injection number. Linear regression was applied only for at least ten data points. Testing of the relation between log2-transformed protein levels and WHO severity grade (as classified according to the WHO ordinal scale48) was performed using Kendall’s Tau test as implemented in the EnvStats R package85 (adjusted P < 0.01, Benjamini–Hochberg for multiple testing82). Proteins were considered for differential expression analysis only when identified in at least 90% of individuals/patients.

For the analysis of yeast drug screen data, proteins were considered only if detected in >50% of the samples, and samples were removed if they had <80% of the maximum identification number across samples. Only proteins identified with proteotypic (that is, specific) peptides and 0.01 protein q-value were considered. The differential expression analysis (drug-treated versus DMSO-treated) for the yeast drug screen was done on the log2-transformed protein quantities using a t-test (two-sided), considering proteins detected in at least three out of the four replicates. The Benjamini–Hochberg procedure82 was used for multiple testing correction. Drugs were considered for the subsequent analysis only if they had >20 differentially expressed proteins, and samples treated with folic acid and FIN56 were excluded from the analysis because these do not belong to the three studied drug classes.

Coefficients of variation were calculated for each protein or precursor as its empirical standard deviation divided by its empirical mean, and are reported in percentages. CV values were calculated for proteins or precursors identified in at least two replicate measurements. PCA analysis was always performed only on ubiquitously identified proteins—imputation was not used. Heatmaps were generated with the ComplexHeatmap R package and default settings86. Pathway enrichment was performed with the clusterProfiler R package87 and wikipathways database88. Z-scores were calculated by dividing the (centered) protein quantities by their standard deviations. All plots were generated with R (v.3.6.3)89.

Spectral libraries

The libraries for the K562 benchmark experiments and for the yeast drug screen were generated from ‘gas-phase fractionation’ runs using Scanning SWATH and small precursor isolation windows. First, 5 µg of K562 cell lysate (Promega) or 5 µg of yeast digests was injected and run on a nanoAcquity ultra-performance LC (Waters) coupled to a SCIEX TripleTOF 6600 with a DuoSpray Turbo V source. Peptides were separated on a Waters HSS T3 column (150 mm × 300 µm, 1.8-µm particles) with a column temperature of 35 °C and a flow rate of 5 µl min–1. A 55-min linear gradient ramping from 3% ACN/0.1% formic acid to 40% ACN/0.1% formic acid was applied. The ion source gas 1 (nebulizer gas), ion source gas 2 (heater gas) and curtain gas were set to 15 psi, 20 psi and 25 psi, respectively. The source temperature was set to 75 °C and the ion spray voltage to 5,500 V. In total, 12 injections were run with the following m/z mass ranges: 400–450, 445–500, 495–550, 545–600, 595–650, 645–700, 695–750, 745–800, 795–850, 845–900, 895–1,000 and 995–1,200. The precursor isolation window was set to m/z 1 except for the mass ranges m/z 895–1,000 and m/z 995–1,200, where the precursor windows were set to m/z 2 and m/z 3, respectively. The cycle time was 3 s, consisting of high- and low-energy scans, and data were acquired in ‘high-resolution’ mode. The spectral libraries were generated using library-free analysis with DIA–NN directly from these Scanning SWATH acquisitions. For this DIA–NN analysis, MS2 and MS1 mass accuracies were set to 25 and 20 ppm, respectively, and scan window size was set to 6.

For the analysis of COVID-19 plasma samples, a project-independent public spectral library24 was used as described previously23. The Human UniProt83 isoform sequence database (3AUP000005640) was used to annotate the library. The library was first automatically refined based on the dataset in question at 0.01 global q-value (using the ‘Generate spectral library’ option in DIA–NN). DIA–NN performs such refinement by finding the highest-scoring identification for each library precursor across all runs in the experiment, and then replacing the library data with the empirically observed spectrum and retention time.

Empirical FDR estimation with two-species library

Because FDR calculations are software- and acquisition-mode-specific, thus potentially affecting benchmarking results, we also compared Scanning SWATH data with conventional stepped SWATH using the two-species library approach, which estimates true-positive calls in an unbiased fashion on the basis of an empirically measured FDR8,22. We augmented the human library with A. thaliana precursors, obtained from ProteomeXchange (dataset PXD012710, Arabidopsis proteome spectral library, ‘Arabidopsis_Library_TripleTOF5600_Spectronaut.xls’), as negative controls. Peptides that matched both the UniProt83 human canonical proteome (3AUP000005640) and the UniProt A. thaliana canonical proteome (3AUP000005648) were removed from the library. Spectra and retention times in the merged human/A. thaliana library were replaced with in silico predicted values whenever possible using the deep-learning-based prediction integrated in DIA–NN. Empirical FDR was estimated as previously described22. In short, empirical FDR is the ratio of A. thaliana precursors and human precursors identified multiplied by the ratio of human precursors and A. thaliana precursors in the library (only precursors ranging 400–900 m/z were considered).

Reporting Summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.