Precise label-free quantitative proteomes in high-throughput by microLC and data-independent SWATH acquisition

While quantitative proteomics is a key technology in biological research, the routine industry and diagnostics application is so far still limited by a moderate throughput, data consistency and robustness. In part, the restrictions emerge in the proteomics dependency on nanolitre/minute flow rate chromatography that enables a high sensitivity, but is difficult to handle on large sample series, and on the stochastic nature in data-dependent acquisition strategies. We here establish and benchmark a label-free, quantitative proteomics platform that uses microlitre/minute flow rate chromatography in combination with data-independent SWATH acquisition. Being able to largely compensate for the loss of sensitivity by exploiting the analytical capacities of microflow chromatography, we show that microLC-SWATH-MS is able to precisely quantify up to 4000 proteins in an hour or less, enables the consistent processing of sample series in high-throughput, and gains quantification precisions comparable to targeted proteomic assays. MicroLC-SWATH-MS can hence routinely process hundreds to thousands of samples to systematically create precise, label free quantitative proteomes.


INTRODUCTION
Mass spectrometrybased proteomics emerges as prime technology for identifying and quantifying proteins, determining activity, turnover and modification state, and closing gaps in structural biochemistry [1][2][3] . The routine application of quantitative proteomics in diagnostics 4 and industrial settings so far hampered by a low throughput (few samples/day), moderate batchtobatch stability as well as intensive hardware maintenance, all of which render proteomics expensive. Partially, these limits emerge from the proteomics dependency on nanolitre/minute flow rate chromatography (nanoLC) and the necessary electrospray devices (nanospray). Only few years back, nanoLC technology did pave the way for a breakthrough by enabling a level of sensitivity that allows to routinely detect thousands of peptides in a single sample 5 , but the minimal flow rate as well as small capillary and emitter diameters are difficult to handle. Further, nanoLC is relatively slow and susceptible to all sorts of technical distortion 6,7 . Second, on large sample series, further limits arise in stochastic elements of datadependent acquisition strategies. As these not necessarily quantify each peptide in each replicate or sample, the number of quantifiable proteins becomes inconsistent the larger the sample series becomes.
Some of the original limitations of proteomics do however not necessarily apply any longer, and can be circumvented by the use of new technical developments. First, analytical hardware has substantially progressed recently. Liquid chromatography can now be operated at much higher pressure, allowing higher peak capacity and better resolution in a shorter time. Furthermore, mass spectrometers gained acquisition speed, sensitivity and resolution, so that they can handle faster chromatography and higher dilution rates. Second, dataindependent acquisition strategies such as MS E 8,9 or SWATHMS 10,11 have achieved the necessary precision, so that they routinely overcome the stochastic elements in data acquisition. In parallel, a decade of proteome research has revealed that the typical cellular response involves multiple proteins, which enables identifying biological responses without the need for full proteome coverage, powerful machine learning detects new biological patterns if applied to large sample series [12][13][14] . As typical proteomic responses can be, in quantitative terms, moderate (fold changes of 13x being typical), these approaches however require a high quantitative precision in the datasets. For several applications, the ultimate goal of achieving full proteome coverage has hence secondary importance compared to the imminent need for achieving high quantitative precision and throughout, so that small concentration changes are reliably detected also in large sample series.
A potential solution for key limits in proteomics is capillary, or microflow, chromatography. MicroLCMS is by principle less sensitive than nanoLC based proteomics, as it operates at 12 orders of magnitude higher flow rate than the proteomictypical nanoLC chromatography. However, the higher flow rate renders microLC compatible with robust analyticalscale electrospray technologies that also allow a high level of control on ionization parameters. Furthermore, by using larger capillarytype columns and electrospray emitter diameters, microLC has shorter dead times, is easier to handle, is more robust, provides a high runtime stability allowing a sophisticated chromatography. Here, we optimize a proteomic setup which combines microlitre/minute flow rate chromatography (microLC) with dataindependent SWATH acquisition, and test its suitability to complement classics setups for highthroughput proteomic applications. We find that by exploiting the larger sample capacities of microLC, the reduced sensitivity of microLC compared to nanoLC, can for a large extent be compensated. Preserving the technical advantages of a higher flow rate however, high quantitative precision performance characteristics are achieved in a combination with sequential windowed acquisition of all mass transitions (SWATHMS) 15 , which enables to overcome the undersampling problem of the fast chromatography. Further, we present a computational batch correction strategy applicable to the quantitative proteome values, so that the routine processing of hundreds to thousands of quantitative proteomes is enabled.

RESULTS
To find an optimal compromise of flow rate and sensitivity for microLC proteomics, we first evaluated the relationship on a combined nanoLC and microLC chromatographic setup, coupled to a mass spectrometer selected for a high (up to 100 Hz) acquisition speed so that fast chromatography can to be handled (TripleTOF5600 16 ). Signal intensities were determined for standardized peptides (iRT standards 17 ) covering flow rates from 300 nL/min to 10 µL/min, using 75 µm (0.30.7 µl/min) or 300 µm (110 µl/min) inner diameter columns. The 30fold stepwise increase in flow rate reduced signal intensities by a factor of seven (Fig. 1A). A good compromise between signal and chromatographic quality was found at flow rates between 3 and 5 µL/min (Fig. 1B), and at 3 µL/min, signals were only reduced by a factor of 3.5 compared to nanoLCMS (Fig. 1A, dotted lines). This decline in signal could be compensated by the higher capacity of the microLC columns, which allow loading of up to 15 µg wholeproteome tryptic digest (around 10x the amount that can be separated in nanoflow chromatography), while still allowing detection of >1200 proteins with as little as 2 µg digest (Fig. 1C). MicroLCSWATH MS hence requires more samples injection as conventional proteomic setups based on nanoLC. Despite a sample quantities represent a common limitation in proteomics, this particular requirement of micro LC represents however a problem only in a small subset of proteomic cases: the typical proteomic sample preparation methods yield~1050 µL of digest, of which in nanoLCMS/MS typically 1 µL to 2 µL are injected. In microLCMSMS, simply a higher injection volume of the same samples is used.
In combination with conventional shotgun datadependent acquisition (DDA), protein identification capacity increased with gradient length, so that >30 minute microflow gradients were sufficient to identify >1000 proteins in a single injection (Fig. 1D). In combination with high pH reversed phase chromatography prefractionation, this setup did identify 85 % of all expressed yeast open reading frames as detected previously 18 . The overlapping set of 3822 proteins were identified with some abundance bias against the low concentrated proteins (Fig. 1E). Undersampling is a typical problem in datadependent acquisition (DDA) strategies. Sharper and better separated chromatographic peaks are analytically desirable, however, fast chromatography amplifies the problem of undersampling. To establish a precise quantitative workflow, we therefore combined microLC with SWATHMS acquisition, a dataindependent acquisition (DIA) strategy where all precursors falling into an isolation window are fragmented simultaneously, and chromatograms are reconstructed computationally postacquisition 15 . Cycle times of conventional SWATHMS are in the range of 3 seconds and hence in microLC result in only 46 data points covering a peak with a typical 12 s full width at half maximum (FWHM), which would fall short for achieving a precise quantitative dataset. The SWATH regime was hence adapted towards microLC by reducing the windowed dwell time to 40 ms (which on the high concentrated samples was achieved without notable loss in signal and identification capacity (Suppl. Fig. 1)) and limiting segmented acquisition to the precursorrich mass range between 400850 m/z. While 85 % of precursors enabling to quantify 96 % of proteins fall into this mass range (Suppl. Fig. 2), the modification allowed shortening the cycle time to 1.3s, which enabled to cover a typical microLC chromatographic peak with a critical amount of 812 data points (Fig. 1F).
On this setup we acquired replicate injections of 10 µg of a wholecell yeast digest using 60 min gradients and 3 µl/min, and for comparison, used different strategies for SWATHMS data extraction. SWATH spectral libraries were generated using proteome prefractionation (following original approaches 15,19 ), repeated injection of a sample mixture matching the actual sample matrix ('exhaustion'), or by pseudoMS/MS correlative precursorfragment feature extraction using DIAUmpire 20 . All libraries were generated using the same protocol 19 at <1 % FDR using a combination of X! Tandem 21 and Comet 22 search engines. Extracting a typical microLCSWATH run using the spectral library created by prefractionation, we could quantify 1766±46 yeast proteins using 34x25 m/z SWATH windows, or 1422±53 proteins when using 29x16 m/z windows, respectively ( Fig. 1G and Suppl. Fig. 3). Analysis of the same data using the library generated by exhaustion allowed quantification of 1271±5 and 1157±13 proteins, which was similar to a public library generated by nanoLCMS/MS (Spectronaut 23 repository) yielding 1256±23 and 1118±26 proteins, respectively. In the absence of separately acquired spectral library, DIAUmpire instead did quantify 952±0 and 890±2 proteins ( Fig. 1G and Suppl. Fig. 3). Also in terms of peptide quantification, peak reconstruction on the basis of a prefractionationbased library performed best, followed by the public and exhaustionbased libraries (Suppl. Fig. 4 and Suppl. Fig. 5). Next, we tested the performance of microLCSWATHMS on a standardized wholeproteome human cell line (K562) digest, by extracting data using three publicly available spectral libraries generated by combining multiple tissues and fractionation 24 or by repetitive injection of tissuespecific cell digests of HEK293 or HeLa cells (Spectronaut 23 repository). MicroLCSWATHMS achieved quantification of 3951±205, 1832±74 and 2007±63 proteins, respectively, out of singleinjections of the unfractionated K562 digest, with peptide numbers following the same trend (Fig. 1I, Suppl. Fig. 7) .
The MicroLCSWATHMS optimized in this way did yield precise quantities in label free proteomics. Median coefficients of variation (CVs) for replicate injections in all acquisition strategies and analysis libraries were in the range of 5.48.8 % for both peak areas as well as protein fold changes for the yeast samples ( Fig. 1H and Suppl. Fig. 6), and 5.57 % for the standardized human sample. The precision was similar both for low and high concentrated peptides, over the full range of five orders of magnitude (Fig. 1J, Suppl. Fig. 8, Suppl. Fig. 9 and Suppl. Fig. 10). Of note, on our dataset, the library generated with DIAUmpire quantified less proteins, but yielded the best precision values (Fig 1H). Being labelfree, microLCSWATHMS strategies thus clearly equal or even exceeds, precision values as typically obtained in a labelfree quantitative proteomic experiment 15,[25][26][27] .
Applied to large sample series that are enabled by the fast throughput of microLCSWATHMS, such high precision values facilitate to detect small and moderate protein concentration changes, that with a technology of low precision would be lost due to error propagation. On large proteomic sample series additional limits emerge however from unavoidable batch effects. If left uncorrected, these can exceed that of the biological information 28,29 . To monitor batch characteristics and to develop countermeasures, we analyzed 296 Saccharomyces cerevisiae proteomes (38 yeast cultures grown in nine replicates, 3 biological x 3 technical) and acquired them in three batches. We repetitively included a mixture of all samples as quality control (QC) sample every 1012 injections, which in total resulted in 327 wholeproteome samples processed (Fig. 1L). MicroLC was, even without retention time normalization, highly stable (standard deviation of apex retention times 17.7 s in 60 min gradients over the 327 runs, Fig. 1K, inset) both within as well as across batches (Fig. 1L). Within the batches spanning 100116 whole proteome digests each acquired over a period of around 9 days (net acquisition time 6 days), median CVs of QC samples were around 12 %, while QC CVs between batches were 17.4 % (Fig. 1M). Correction for the chromatographic drifts was effectively accomplished by the use of retention time standards and linear normalisation 17 . To address the more complicated batch effects that emerge from signal intensity drifts and fluctuations in the ion traces, we adapted a strategy based on models developed for gene expression analysis 28 making use of the QC samples to monitor instrument performance. Intensity correction reduced signal variability by 43 % that largely correspond to the batch effect, while maintaining each individual strain's proteome profile (Fig. 1N, Fig. 1O, Suppl. Fig. 11 and Suppl. Fig. 12). Applying these strategies to the 327 whole proteomes yielded the quantification of 1212±100 proteins in each sample, with the average peptide quantified with CV of 22.3±5.4 % across all technical and biological replicates.
In conclusion, we show that microLCMS/MS in combination with sample fractionation is able to achieve competitive protein identification characteristics to that of canonical proteomics platforms, while in combination with data independent acquisition and SWATHMS, the platform can capture a quantitative snapshot of a yeast or human proteome in an hour or less and in high throughput. Achieving high precision and signal stability in label free proteomics, microLCSWATHMS enabled processing hundreds of samples in series and in a few days. The performance characteristics are illustrated by quantifying 1210±100 proteins across 327 whole proteomes, achieved in a net acquisition time of solely 17 days, and with CV values that are comparable to that of FDA approved targeted assays. The technology is hence ideal for large proteomic sample series, and for applications where precision and quantitative robustness are the key objectives. This includes, for instance, the application of machine learning strategies to detect novel biological patterns, for the analysis of biological time series, and for comparative studies in basic research and diagnostics that require large sample series. Indeed, the high flow rate renders microLCSWATHMS a highly robust proteomic technology, with low instrument downtime and maintenance cycles, so it is ideal for laboratories that address large sample number in diagnostics and for datadriven biology.

Dependency of signal intensity on flow rate in a proteomic experiment
Combined intensities of standardized peptides (iRT) were determined using nano, low micro, and highmicro flow regimes on an Eksigent 425 system equipped with three respective flow modules and recorded on a TripleTOF5600 mass spectrometer. Signal intensity is a function of the dilution rate, with a factor of 0.3 between 0.3 µl/min and 3 µl/min.

B.
Peak characteristics on a microLC setup Average precursor peak shapes of 5 iRT peptides determined using flow rates of 1 10 µl/min on an Eksigent 3C18CL120 column . Chromatography is stable and reproducible in flow rates >1 µl/min. Shaded areas represent standard deviation of signal intensity. C.
2 µg tryptic digested protein is sufficient to quantify >1200 yeast proteins in microLCSWATHMS. 115 µg of yeast protein tryptic digest were injected and separated using a 60 min water to acetonitrile gradient at a flowrate of 3 µL/min. 914 proteins were identified with 1 µg, 1219 proteins with 2 µg, 1428 proteins with 5 µg and 1504 µg with 15 µg digested protein. Analysis was conducted in Spectronaut v8.0, using a library generated according to Schubert et al. 19 from a fractionated yeast sample.

D.
A 30 min LC gradient is sufficient to quantify >1000 proteins in microLCSWATHMS. 5 µg of yeast protein tryptic digest were injected and separated using watertoacetonitrile chromatographic gradients of 1090 min at a flowrate of 3 µL/min. Extraction of the SWATH spectra yielded quantifiable peptides for 740 proteins (10 min), 946 proteins (20 min), 1170 proteins (30 min), 1322 proteins (45 min), 1420 proteins (60 min) and 1455 proteins (90 min). Analysis conducted in Spectronaut v8.0, using a library generated according to Schubert et al. 19 from a fractionated yeast sample. E. SWATH spectral library generation using sample fractionation or exhaustion.
In the sample fractionation approach, a yeast tryptic digest was first separated by high pH reverse phase chromatography on an analytical HPLC as described in methods, and then analyzed in DDA mode with m/z (gas phase) fractionation at 3 µl/min flow rate. In the sample exhaustion approach, a yeast tryptic digest was injected repeatedly until protein identification was saturated. When comparing the proteins identified in both approaches with the published abundances of yeast proteins 18 , the most abundant proteins were identified in both approaches, while proteins with low expression levels were only identified using fractionation. Inset: In total, 3822 (84 %) or 1037 (23 %) out of 4517 expressed yeast proteins 18 were identified using either method, respectively.
To illustrate peak coverage, an extracted ion chromatogram (XIC) of the peptide TPVITGAPYYER recorded in microLCSWATH mode using either 34 x 25 m/z or 29 x 16 m/z windows is shown. NanoLCoptimized SWATH 15 of 34 x 25 m/z with a cycling time of 3.3 s leads to a coverage of 5 points per peak. When limiting the mass range covered to 400850 m/z and reducing accumulation time to 40 ms, cycling time is 1.32 s and peaks are covered by 11 data points, while being able to capture precursors for 96 % of proteins. G.
Protein quantification in microLCSWATHMS using different strategies to construct spectral libraries.
A yeast tryptic digest was analyzed using microLC (0.3 mm x 250 mm TriartC18, 3 µl/min, 60 min gradient) SWATHMS by repeated injection of 10 µg digest (9x). Data was processed with Spectronaut v8.0 using SWATH libraries generated by either sample fractionation, sample exhaustion (matrixmatched library), using a spectral library recorded in an unrelated lab (Spectronaut repository), or with a library generated by DIAUmpire out of the SWATH traces without physically recording a separate spectral library. Data analysis on the basis of the fractionation allowed quantification of 1766 proteins, while 1271 proteins were quantified on the basis of the exhaustion library. The unrelated SWATH library quantified 1256 proteins, and DIAUmpire 952 proteins. All libraries except the unrelated one (obtained from Spectronaut repository) were generated according to Schubert et al. 19 .

H. Technical variability of yeast protein quantification is low in microLCSWATHMS irrespective of data extraction
Fold change (random reference) variability of 677 proteins present in all data sets was compared throughout the nine replicates. Median coefficients of variation are between 7.3 % and 8 % for libraries generated using respectively fractionation and exhaustion approach, 7.6 % for an unrelated yeast library, and 5.4 % for a library generated by DIAUmpire. All libraries except of obtained from Spectronaut repository were generated according to Schubert et al. 19 protocol.

I.
Human protein quantification using microLCSWATH A tryptic digest of a wholecell protein extract from human K562 cells was analyzed using microLC (0.3 mm x 250 mm TriartC18, 3 µl/min, 60 min gradient) and coupled to a TripleTOF5600 MS operating in SWATH mode by repeated injection of 3 µg digest (6x). Data was processed with Spectronaut v8.0 using a SWATH library obtained from the SWATHAtlas repository 24 , or using SWATH libraries generated by repeated analysis of HEK293 or HeLa cell extracts (Spectronaut repository). Data analysis using a rich library allows quantification of 4169 proteins, while 2031 proteins can be quantified using a HEK293 and 1906 using a HeLa library, respectively. J.
Technical variability of human protein quantification is low in microLCSWATHMS irrespective of data extraction Fold change variability (random reference) of 726 proteins present in all data sets was compared throughout the six replicates. Median coefficients of variation are around 7 % for all libraries.

K.
Retention time stability microLCSWATHMS over 327 runs Correlation between measured apex retention time and predicted retention time. Shown is a representative yeast sample acquired in SWATH mode.
Inset: Mean retention time standard deviation of 6 iRT peptides across 327 injections is 17.7 s. L.
Retention times are very stable in microLCSWATHMS. 327 yeast tryptic digest samples spiked with iRT peptides were analyzed on a microLCSWATHMS (nanoACQUITY/TripleTOF5600) system in three batches over 3 x 9 days (net acquisition time 16 days, grey vertical lines). Retention times of iRT peptides are shown over time (colored lines), and retention time coefficient of variation for all peptides is lower than 2 % over the whole period. Across the experiment, a control sample was injected repeatedly (red vertical lines) as quality control.

M.
Technical variability in microLCSWATHMS is low for replicates acquired over a time period of 27 days. Quality control samples described in L. were analyzed using Spectronaut v8.0, and coefficient of variation for fold changes of 8686 peptides was calculated in batch 1 (green), batch 2 (orange), batch 3 (purple) or across batches (magenta). Intrabatch CVs were around 12 %, while variability over the entire 27 day period was 17.4 %.

N.
Variability of 327 yeast proteomes before batch correction. 38 yeast strains were grown in three batches, and each batch was acquired as three technical replicates in SWATHMS together with 1012 evenly interspersed quality control samples. In a PCA, proteomes cluster according to the acquisition batch, with colorcoded technical replicates clustering together.

O.
Variability of 327 yeast proteomes after batch correction.
After batch correction based on the combined quality control sample profiles, clustering according to batches is reduced, and proteomes cluster according to the colorcoded yeast strain. Inset: Median coefficient of variation of peptide intensities between all replicates (technical and biological, 9 replicates per strain) are 39.7 ± 3.2 before batch correction and 22.3 ± 5.4 after batch correction.

Materials, solutions and reagents
Chemicals and reagents were obtained from Sigma unless stated otherwise, and UPLC/MS grade chromatographic solvents were from Greyhound.

Sample preparation for mass spectrometry
A standardized yeast sample was generated by growing the prototrophic S. cerevisiae strain YSBN1 30 in Yeast Nitrogen Base (YNB) medium without amino acids containing 2 % glucose until midexponential phase. Cells were harvested by centrifugation and snapfrozen in aliquots equaling 10 OD. Sample preparation was performed based 31

Chromatography
Chromatographic separation was performed either on an Ekspert NanoLC 425 system (SCIEX) for combined nano and micro flow analysis, or a nanoACQUITY system (Waters) for microflowonly sample series. In nano flow, the NanoLC 425 system was equipped with a nano flow module, and samples were first loaded onto a trap column (Chrom XP C183µm, 0.12 nm, 0.35 x 0.5 mm) by isocratically running the system at a flow rate of 5 µl/min for 6 min with 0.1 % formic acid (FA) in water. Peptides were then eluted onto the analytical column (3C18CL120, 3 µm, 0.12 nm, 0.075 x 150 mm, Eksigent) and separated on a linear gradient of 230 % 0.1 % FA in acetonitrile (ACN) in 25 min. For micro flow analysis, the same system was equipped with a low micro flow module (15 µl/min) or high micro flow module (510 µl/min) and set up for direct injection onto an analytical column (3C18CL120, 3 µm, 120 Å, 0.3 x 150 mm, Eksigent). Separation was performed on a linear gradient of 230 % 0.1 % FA in ACN in 25 min. For micro flow on the nanoACQUITY system, the sample manager was set up in direct injection mode and equipped with a Triart C18 column (0.12 nm, 3 µm, 0.3 mm x 250 mm, YMC). After injecting samples onto the analytical column, peptides were separated on linear gradients detailed in Suppl. table 1.
To acquire spectral libraries used in SWATH data extraction, the mass spectrometer was operated in informationdependent acquisition (IDA) and high sensitivity mode, with first a 250 ms TOF MS survey scan over a mass range of 4001250 m/z, followed by 100 ms MS/MS scans of 20 ion candidates per cycle with dynamic background subtraction. The selection criteria for the parent ions included the intensity, where ions had to be greater than 150 cps, with a charge state between 2 and 4. The dynamic exclusion duration was set for 1 s. Collisioninduced dissociation was triggered by rolling collision energy. For generation of SWATH libraries by sample fractionation, precursorrich fractions were further injected twice, with gas phase fractionation of 400650 and 6501250. For dataindependent acquisition, the instrument was operated in SWATH mode with selection windows detailed in Suppl.

Data analysis and SWATH library generation
All SWATH assay libraries were built following Schubert et al 19 . Briefly, spectral data acquired in IDA mode was centroided using qtofpeakpicker 32 . Centroided files were searched with X! Tandem 21 and Comet 22 against annotated yeast proteins database with included reversed decoy peptides. Search results were scored using PeptideProphet and combined with iProphet. Mayu 33 was used to estimate iProphet probabilities to control for protein identification false discovery rate (FDR <1 %). The final spectral library was assembled using SpectraST 34 by retaining spectra above iProphet FDR controlled cutoff and normalizing chromatography to iRT peptide retention time reference. SpectraST output was then converted to tsv format suitable for Spectronaut retaining 6 most intense transitions of y and b ions using spectrast2tsv from msprotoemicstools 35 . For largescale data analysis of 327 yeast samples, a minimal consensus library constructed from exhaustionbased IDA acquisitions was compiled in Spectronaut. In the DIAUmpire approach, we extracted precursorfragment features by applying signal extraction module from DIAUmpire workflow using default recommended parameters for TripleTOF5600 instrument on data acquired in SWATH mode by sample exhaustion of the yeast proteome. Generated pseudo MS/MS mgf files were converted into mzXML and further subjected to database search and processed as described above to generate SWATH assay library. All SWATH data quantification was performed in Spectronaut (v. 8.0.9600, Biognosys) using default settings. Publicly available SWATH libraries used were obtained from the Biognosys library repository or from SWATHAtlas 24 .
For visualization of chromatographic peaks, data of selected peptides was analyzed in Skyline 36 (v. 3.5.0.9191) with SWATH isolation windows detailed in Suppl. table 2, and chromatograms of precursors and products exported as text files.
Postprocessing was conducted in R 37 by first removing precursors from all samples where the median Qvalue was > 0.01, and then transforming all remaining precursors with Qvalue > 0.01 into NA. Injection differences were corrected by a robust sum approach, and peptides belonging to one protein were selected based on closest fold change correlation. To account for confounding effects related to acquisition dates, we performed batch correction by introducing QC samples in experimental design. External standard QC samples were prepared as a mixture of all injected samples and were measured every 1012 samples. Each MS acquisition batch had >10 QC samples allowing to correct for the most evident batch effects attributed to an acquisition date (Suppl. Fig. 11). Signal correction was performed using ComBat approach 38 as implemented in R sva package 39 . Plotting was performed in ggplot2 package 40 . Figure 1