a The ProteomeTools resource was extended in this study by ~305 k non-tryptic synthetic peptides consisting of ~169 k HLA class I, ~73 k HLA class II, ~32 k AspN and 31 k LysN peptides. All peptides were systematically characterized by multimodal LC-MS/MS. All data were subsequently used for training the 2020 Prosit fragment intensity prediction model. b Proportional Venn diagrams of HLA class I (top) and HLA class II (bottom) peptides in ProteomeTools (blue), SysteMHC Atlas (light red), IEDB (gray), and B.-Sternberg et al.11. (white). c Number of peptides (log10 color scale, white to dark blue) synthesized for the ProteomeTools resource sorted by N- (y-axis) and C-terminal (x-axis) amino acid without (top) and with (bottom) the extension of non-tryptic peptides from this study. d Mirror spectrum of the singly charged non-tryptic peptide TSGYGQSSYSSY acquired by B.-Sternberg and Bräunlein et al.11 (endogenous peptide, top) and ProteomeTools (synthetic peptide, bottom). Fragment ion peaks with and without neutral losses are annotated in blue, red, green, and orange for b-, y-, a- and internal fragment ions, respectively. The spectral similarity measured by the normalized spectral contrast angle between the two spectra and the Andromeda matching score of the top spectrum are shown in the top. e Boxplots of Andromeda scores for the best MS/MS identification per precursor for HLA class I (dark blue), HLA class II (light blue), AspN (yellow), and LysN (red) peptides for different fragmentation settings (HCD, CID, ETD, EThcD, ETciD) and mass analyzers (FTMS: Orbitrap mass analyzer, ITMS: ion trap mass analyzer). The number of spectra (n) and median Andromeda score (median) are depicted at the top and bottom of the boxplot. The box indicates the interquartile range (IQR). The black line marks the median, notches extend to 1.58 * IQR/sqrt(n), no whiskers or outliers outside IQR shown. Raw and analysis data are available from the PRIDE repository with identifier PXD021013.

